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Background 

Various notational systems have been used to encode classes of chemical units by 
assigning a unique code to each chemical unit in the class. For example, a conventional 
notational system for encoding amino acids assigns a single letter of the alphabet to each 

1 5 known amino acid. A polymer of chemical units may be represented using such a notational 
system using a set of codes corresponding to the chemical units. Such notational systems 
have been used to encode polymers, such as proteins, in a computer-readable format. A 
polymer that has been represented in such a computer-readable format according to a 
notational system may be stored and processed by a computer. 

20 Conventional notational schemes for representing chemical units have represented the 

chemical units as characters (e.g., A, T, G, and C for nucleic acids), and have represented 
polymers of chemical units as sequences or sets of characters. Various operations may be 
performed on such a notational representation of a chemical unit or a polymer comprised of 
chemical units. For example, a user may search a database of chemical units for a query 

25 sequence of chemical units. In such a case, the user typically provides a character-based 
notational representation of the sequence in the form of a sequence of characters, which is 
compared against the character-based notational representations of sequences of chemical 
units stored in the database. Character-based searching algorithms, however, are typically 
slow because such algorithms search by comparing individual characters in the query 

30 sequence against individual characters in the sequences of chemical units stored in the 
database. The spread of such algorithms is therefore related to the length of the query 
sequence, resulting in particularly poor performance for long query sequences. 

The study of molecular and cellular biology is focused on the macroscopic structure of 
cells. We now know that cells have a complex microstructure that determine the functionality 



of the cell. Much of the diversity associated with cellular structure and function is due to the 
ability of a cell to assemble various building blocks into diverse chemical compounds. The 
cell accomplishes this task by assembling polymers from a limited set of building blocks 
referred to as monomers. The key to the diverse functionality of polymers is based in the 
5 primary sequence of the monomers within the polymer and is integral to understanding the 
basis for cellular function, such as why a cell differentiates in a particular manner or how a 
cell will respond to treatment with a particular drug. 

The ability to identify the structure of polymers by identifying their sequence of 
monomers is integral to the understanding of each active component and the role that 
10 component plays within a cell. By determining the sequences of polymers it is possible to 
generate expression maps, to determine what proteins are expressed, to understand where 
mutations occur in a disease state, and to determine whether a polysaccharide has better 
function or loses function when a particular monomer is absent or mutated. 

15 Summary 

Polymers may be characterized by identifying properties of the polymers and 
comparing those properties to reference polymers, a process referred to herein as property 
encoded nomenclature (PEN). In one embodiment, the properties are encoded using a binary 
notation system, and the comparison is accomplished by comparing the binary representations 

20 of polymers. For instance, in one aspect a sample polymer is subjected to an experimental 
constraint to modify the polymer, the modified polymer is compared to a reference database 
of polymers to identify a population of polymers having a property that is the same as or 
similar to a property of the sample polymer. The method may be repeated until the population 
of polymers in the reference database is reduced to one and the identity of the sample polymer 

25 is known. 

In a system including a database of properties of polymers of chemical units a method 
for determining the composition of a sample polymer of chemical units having a known 
molecular weight and length is provided according to one aspect of the invention. The 
method includes the steps of 
30 (A) selecting, from the database, candidate polymers of chemical units having the 

same length as the sample polymer of chemical units and having molecular 
weights similar to the molecular weight of the sample polymer of chemical 
units; 
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(B) performing an experiment on the sample polymer of chemical units; 

(C) measuring properties of the sample polymer of chemical units resulting from 
the experiment; and 

(D) eliminating, from the candidate polymers of chemical units, polymers of 

5 chemical units having properties that do not correspond to the experimental 

results. 

In some embodiments the method also includes the step of: 

(E) repeatedly performing the step (D) until the number of candidate polymers of 
chemical units falls below a predetermined threshold. 

10 In other aspects the invention is a method for identifying a population of polymers of 

chemical units having the same property as a sample polymer of chemical units. The method 
includes the steps of determining a property of a sample polymer of chemical units, and 
comparing the property of the sample polymer to a reference database of polymers of known 
sequence and known properties to identify a population of polymers of chemical units having 

1 5 the same property as a sample polymer of chemical units, wherein the reference database of 
polymers includes identifiers corresponding to the chemical units of the polymers, each of the 
identifiers including a field storing a value corresponding to the property. 

In one embodiment the step of determining a property of the sample polymer involves 
the use of mass spectrometry, such as for example, matrix assisted laser desorption ionization 

20 mass spectrometry (MALDI-MS), electron spray-MS, fast atom bombardment mass 

spectrometry (FAB-MS) and collision-activated dissociation mass spectrometry (CAD) to 
determine the molecular weight of the polymer. MALDI-MS, for instance, may be used to 
determine the molecular weight of the polymer with an accuracy of approximately one 
Dalton. 

25 The step of identifying a property of the polymer in other embodiments may involve 

the reduction in size of the polymer into pieces of several units in length that may be detected 
by strong ion exchange chromatography. The fragments of the polymer may be compared to 
the reference database polymers. 

According to other aspects, the invention is a method for identifying a subpopulation 

30 of polymers having a property in common with a sample polymer of chemical units. The 

method involves the steps of applying an experimental constraint to the polymer to modify the 
polymer, detecting a property of the modified polymer, identifying a population of polymers 
of chemical units having the same molecular length as the sample polymer, and identifying a 



-4- 

subpopulation of the identified population of polymers having the same property as the 
modified polymer by eliminating, from the identified population of polymers, polymers 
having properties that do not correspond to the modified polymer. The steps may be repeated 
on the modified polymer to identify a second subpopulation within the subpopulation of 
5 polymers having a second property in common with the twice modified polymer. Each of the 
steps may then be repeated until the number of polymers within the subpopulation falls below 
a predetermined threshold. The method may be performed to identify the sequence of the 
polymer. In this case the predetermined threshold of polymers within the subpopulation is 
two polymers. 

10 In yet another aspect, the invention is a method for identifying a subpopulation of 

polymers having a property in common with a sample polymer of chemical units. The 
method involves the steps of applying an experimental constraint to the polymer to modify the 
polymer, detecting a first property of the modified polymer, identifying a population of 
polymers of chemical units having a second property in common with the sample polymer, 

15 and identifying a subpopulation of the identified population of polymers having the same first 
property as the modified polymer by eliminating, from the identified population of polymers, 
polymers having properties that do not correspond to the modified polymer. 

In one embodiment the experimental constraints applied to the polymer are different 
for each repetition. The experimental constrain may be any manipulation which alters the 

20 polymer in such a manner that it will be possible to derive structural information about the 

polymer or a unit of the polymer. In some embodiments the experimental constraint applied 
to the polymer may be any one or more of the following constraints: enzymatic digestion, e.g., 
with an exoenzyme, an endoenzyme, a restriction endonuclease; chemical digestion; chemical 
modification; interaction with a binding compound; chemical peeling (i.e., removal of a 

25 monosaccharide unit); and enzymatic modification, for instance sulfation at a particular 
position with a heparin sulfate sulfotransferases. 

The property of the polymer that is detected by the method of the invention may be 
any structural property of a polymer or unit. For instance the property of the polymer may be 
the molecular weight or length of the polymer. In other embodiments the property may be the 

30 compositional ratios of substituents or units, type of basic building block of a polysaccharide, 
hydrophobicity, enzymatic sensitivity, hydrophilicity, secondary structure and conformation 
(i.e., position of helices), spatial distribution of substituents, ratio of one set of modifications 
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to another set of modifications (i.e., relative amounts of 2-0 sulfation to N-sulfation or ratio 
of iduronic acid to glucuronic acid, and binding sites for proteins. 

The properties of the modified polymer may be detected in any manner possible which 
depends on the property and polymer being analyzed. In one embodiment the step of 
5 detection involves mass spectrometry such as matrix assisted laser desorption ionization mass 
spectrometry (MALDI-MS), electron spray MS, fast atom bombardment mass spectrometry 
(FAB-MS) and collision-activated dissociation mass spectrometry (CAD). Alternatively, the 
step of detection involves strong ion exchange chromatography, for example, if the polymer 
has been digested into several smaller fragments composed of several units each. 

1 0 The method is based on a comparison of the sample polymer with a population of 

polymers of the same length or having at least one property in common. In some 
embodiments the population of polymers of chemical units includes every polymer sequence 
having the molecular weight of the sample polymer. In other embodiments the population of 
polymers of chemical units includes less than every polymer sequence having the molecular 

1 5 weight of the sample polymer. According to some embodiments the step of identifying 
includes selecting the population of polymers of chemical units from a database including 
molecular weights of polymers of chemical units. Preferably the database includes identifiers 
corresponding to chemical units of a plurality of polymers, each of the identifiers including a 
field storing a value corresponding to a property of the corresponding chemical unit. 

20 According to another aspect of the invention a method for compositional analysis of a 

sample polymer is provided. The method includes the steps of applying an experimental 
constraint to the sample polymer to modify the sample polymer, detecting a property of the 
modified sample polymer, and comparing the modified sample polymer to a reference 
database of polymers of identical size as the polymer, wherein the polymers of the reference 

25 database have also been subjected to the same experimental constraint as the sample polymer, 
wherein the comparison provides a compositional analysis of the sample polymer. 

In some embodiments the compositional analysis reveals the number and type of units 
within the polymer. In other embodiments the compositional analysis reveals the identity of a 
sequence of chemical units of the polymer. 

30 Similarly to the aspects of the invention described above the properties of the polymer 

may be detected in any manner possible and will depend on the particular property and 
polymer being analyzed. In one embodiment the step of detection involves mass spectrometry 
such as matrix assisted laser desorption ionization mass spectrometry (MALDI-MS), electron 
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spray MS, fast atom bombardment mass spectrometry (FAB-MS) and collision-activated 
dissociation mass spectrometry (CAD). Preferably the experimental constraint applied to the 
polymer is an enzymatic or chemical reaction which involves incomplete enzymatic digestion 
of the polymer and wherein the steps of the method are repeated until the number of polymers 
5 within the reference database falls below a predetermined threshold. Alternatively, the step of 
detection involves capillary electrophoresis, particularly when the experimental constraint 
applied to the polymer involves complete degradation of the polymer into individual chemical 
units. 

In one embodiment the reference database includes identifiers corresponding to 

10 chemical units of a plurality of polymers, each of the identifiers including a field storing a 
value corresponding to a property of the corresponding chemical unit. 

According to yet another aspect of the invention a method for sequencing a polymer is 
provided. The method includes the steps of applying an experimental constraint to the 
polymer to modify the polymer, detecting a property of the modified polymer, identifying a 

15 population of polymers having the same molecular length as the sample polymer and having 
molecular weights similar to the molecular weight of the sample polymer, identifying a 
subpopulation of the identified population of polymers having the same property as the 
modified polymer by eliminating, from the identified population of polymers, polymers 
having properties that do not correspond to the modified polymer, and repeating the steps 

20 applying an experimental constraint, detecting a property and identifying a subpopulation by 
applying additional experimental constraints to the polymer and identifying additional 
subpopulations of polymers until the number of polymers within the subpopulation is one and 
the sequence of the polymer may be identified. 

In another aspect the invention relates to a method for identifying a polysaccharide- 

25 protein interaction, by contacting a protein-coated MALDI surface with a polysaccharide 
containing sample to produce a polysaccharide-protein-coated MALDI surface, removing 
unbound polysaccharide from the polysaccharide-protein-coated MALDI surface, and 
performing MALDI mass spectrometry to identify the polysaccharide that specifically 
interacts with the protein coated on the MALDI surface. 

30 In one embodiment a MALDI matrix is added to the polysaccharide-protein-coated 

MALDI surface. In other embodiments an experimental constraint may be applied to the 
polysaccharide bound on the polysaccharide-protein-coated MALDI surface before 
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performing the MALDI mass spectrometry analysis. The experimental constraint applied to 
the polymer in some embodiments is digestion with an exoenzyme or digestion with an 
endoenzyme. In other embodiments the experimental constraint applied to the polymer is 
selected from the group consisting of restriction endonuclease digestion; chemical digestion; 
5 chemical modification; and enzymatic modification. 

Each of the limitations of the invention can encompass various embodiments of the 
invention. It is, therefore, anticipated that each of the limitations of the invention involving 
any one element or combinations of elements may be included in each aspect of the invention. 



10 Brief Description Of The Drawings 

FIG. 1 is a dataflow diagram of a system for sequencing a polymer. 
FIG. 2 is a flow chart of a process for sequencing a polymer. 

FIG. 3 is a flow chart of a process for sequencing a polymer using a genetic algorithm. 
FIG. 4A-D is a set of diagrams depicting notation schemes for branched chain 
1 5 analysis. 

FIG. 5 is a mass line diagram. 

FIG. 6 is a mass-line diagram for (A) Polysialic Acid with NAN and (B) Polysialic 
Acid with NGN. 

FIG. 7 is a graph (A) depicting cleavage by Hep III of either G(o), l(O) or hs(0) 
20 linkages, and a graph (B) depicting same study as in A but where cleavage was performed 
with Hep I. 

FIG. 8 is a graph depicting MALDI-MS analysis of the extended core structures 
derived from enzymatic treatment of a mixture of bi- and triantennary structures. 

25 

FIG. 9 is a graph depicting MALDI-MS analysis of the PSA polysaccharide. (A) intact 
polysaccharide structure. (B) Treatment of [A] with sialidase from A urefaciens. (C) Digest 
of [B] with galactosidase from S. pneumoniae. (D) Digest of [C] with N- 
acetylhexosaminidase from S. pneumoniae. (E) Table of the analysis scheme with schematic 
30 structure and theoretical molecular masses. [O] = mannose; [0]= fucose; [□]= N- 

acetylglucosamine; [□]= galactose; and [A]=N-acetylneuraminic acid. Peaks marked with an 
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asterisk are impurities, and the analyte peak is detected both as M-H (m/z 2369.5) and as a 
monosodiated adduct (M+Na-2H, m/z 2392.6). 

FIG. 10 is a graph depicting the results of enzymatic degradation of the saccharide 
chain directly off of PSA. (A) PSA before the addition of exoenzymes. (B) Treatment of (A) 
5 with sialidase results in a mass decrease of 287 Da, consistent with the loss of one sialic acid 
residue. (C) Treatment of (B) with galactosidase. (D) Upon digestion of (C) with 
hexosaminidase, a decrease of 393 Da indicates the loss of two N-acetylglucosamine residues. 

FIG. 11 is a graph depicting the results of treatment of biantennary and triantennary 
saccharides with endoglycanse F2. (A) Treatment of the biantennary saccharide results in a 
10 mass decrease of 348.6, indicating cleavage between the GlcNAc residues. (B) Treatment of 
the triantennary saccharide with the same substituents results in no cleavage showing that 
EndoF2 primarily cleaves biantennary structures. (C) EndoF2 treatment of heat denatured 
PSA. There is a mass reduction of 1 709.7 Da in the molecular mass of PSA (compare B4C 
and B3a) indicating that the normal glycan structure of PSA is biantennary. 

15 Detailed Description 

The invention relates in some aspects to methods for characterizing polymers to 
identify structural properties of the polymers, such as the charge, the nature and number of 
units of the polymer, the nature and number of chemical substituents on the units, and the 
stereospecificity of the polymer. The structural properties of polymers may provide useful 

20 information about the function of the polymer. For instance, the properties of the polymer 
may reveal the entire sequence of units of the polymer, which is useful for identifying the 
polymer. Similarly, if the sequence of the polymer was previously unknown, the structural 
properties of the polymer are useful for comparing the polymer to known polymers having 
known functions. The properties of the polymer may also reveal that a polymer has a net 

25 charge or has regions which are charged. This information is useful for identifying 

compounds that the polymer may interact with or predicting which regions of a polymer may 
be involved in a binding interaction or have a specific function. 

Many methods have been described in the prior art for identifying polymers and in 
particular for identifying the sequence of units of polymers. Once the sequence of a polymer 

30 is identified the sequence information is stored in a database and may be used to compare the 
polymer with other sequenced polymers. Databases such as GENBANK enable the storage 
and retrieval of information relating to the sequences of nucleic acids which have been 
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identified by researchers all over the world. These databases typically store information using 
notational systems that encode classes of chemical units by assigning a unique code to each 
chemical unit in the class. For example, a conventional notational system for encoding amino 
acids assigns a single letter of the alphabet to each known amino acid. Such databases 
5 represent a polymer of chemical units using a set of codes corresponding to the chemical 
units. Searches of such databases have typically been performed using character-based 
comparison algorithms. 

New methods for identifying structural properties of polymers which can utilize 
Bioinformatics and which differ from the prior art methods of assigning a character to each 
1 0 unit of a polymer have been discovered. These methods are referred to as PEN (property 
encoded nomenclature). In one aspect, the invention is based on the identification and 
characterization of properties of a polymer, rather than units of the polymer, and the use of 
numeric identifiers to classify those properties and to facilitate information processing relating 
to the polymer. 

1 5 The ability to identify properties of polymers and to manipulate the information 

concerning the properties of the polymer provide many advantages over prior art methods of 
characterizing polymers and Bioinformatics. For instance, the methods of the invention may 
be used to identify structural information and analyze complex polymers such as 
polysaccharides which were previously very difficult to analyze using prior art methods. 

20 The heterogeneity and the high degree of variability of the polysaccharide building 

blocks have hindered prior art attempts to sequence these complex molecules. With the 
advent of extremely sensitive techniques like High Pressure Liquid Chromatography (HPLC), 
Capillary Electrophoresis (CE) and Mass Spectrometry (MS) to isolate and characterize large 
biomolecules, significant advances have been made in isolating and purifying polysaccharide 

25 fragments containing specific sequences but extensive experimental manipulation is still 
required to identify and sequence information. Additionally, in most of these approaches, 
plenty of information about the sequence is required in order to design the experimental 
manipulations that will enable the sequencing of the polysaccharide. The methods of the prior 
art provide simple and rapid methods for identifying sequence information. Many other 

30 advantages will be clear from the description of the preferred embodiments set forth below. 

A "polymer" as used herein is a compound having a linear and/or branched backbone 
of chemical units which are secured together by linkages. In some but not all cases the 
backbone of the polymer may be branched. The term "backbone" is given its usual meaning 
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in the field of polymer chemistry. The polymers may be heterogeneous in backbone 
composition thereby containing any possible combination of polymer units linked together 
such as peptide- nucleic acids. In some embodiments the polymers are homogeneous in 
backbone composition and are, for example, a nucleic acid, a polypeptide, a polysaccharide, a 
5 carbohydrate, a polyurethane, a polycarbonate, a polyurea, a polyethyleneimine, a polyarylene 
sulfide, a polysiloxane, a polyimide, a polyacetate, a polyamide, a polyester, or a 
polythioester. A "polysaccharide" is a biopolymer comprised of linked saccharide or sugar 
units. A "nucleic acid" as used herein is a biopolymer comprised of nucleotides, such as 
deoxyribose nucleic acid (DNA) or ribose nucleic acid (RNA). A polypeptide as used herein 

10 is a biopolymer comprised of linked amino acids. 

As used herein with respect to linked units of a polymer, "linked" or "linkage" means 
two entities are bound to one another by any physicochemical means. Any linkage known to 
those of ordinary skill in the art, covalent or non-covalent, is embraced. Such linkages are 
well known to those of ordinary skill in the art. Natural linkages, which are those ordinarily 

1 5 found in nature connecting the chemical units of a particular polymer, are most common. 

Natural linkages include, for instance, amide, ester and thioester linkages. The chemical units 
of a polymer analyzed by the methods of the invention may be linked, however, by synthetic 
or modified linkages. Polymers where the units are linked by covalent bonds will be most 
common but also include hydrogen bonded, etc. 

20 The polymer is made up of a plurality of chemical units. A "chemical unit" as used 

herein is a building block or monomer which may be linked directly or indirectly to other 
building blocks or monomers to form a polymer. The polymer preferably is a polymer of at 
least two different linked units. The particular type of unit will depend on the type of 
polymer. For instance DNA is a biopolymer comprised of a deoxyribose phosphate backbone 

25 composed of units of purines and pyrimidines such as adenine, cytosine, guanine, thymine, 

5-methylcytosine, 2-aminopurine, 2-amino-6-chloropurine, 2,6-diaminopurine, hypoxanthine, 
and other naturally and non-naturally occurring nucleobases, substituted and unsubstituted 
aromatic moieties. RNA is a biopolymer comprised of a ribose phosphate backbone 
composed of units of purines and pyrimidines such as those described for DNA but wherein 

30 uracil is substituted for thymidine. DNA units may be linked to the other units of the polymer 
by their 5' or 3' hydroxyl group thereby forming an ester linkage. RNA units may be linked 
to the other units of the polymer by their 5', 3 ? or 2' hydroxyl group thereby forming an ester 
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linkage. Alternatively, DNA or RNA units having a terminal 5', 3' or 2' amino group may be 
linked to the other units of the polymer by the amino group thereby forming an amide linkage. 

Whenever a nucleic acid is represented by a sequence of letters it will be understood 
that the nucleotides are in 5' — > 3' order from left to right and that "A" denotes adenosine, 
5 "C" denotes cytidine, "G" denotes guanosine, "T" denotes thymidine, and "U" denotes uracil 
unless otherwise noted. 

The chemical units of a polypeptide are amino acids, including the 20 naturally 
occurring amino acids as well as modified amino acids. Amino acids may exist as amides or 
free acids and are linked to the other units in the backbone of the polymers through their 
1 0 a-amino group thereby forming an amide linkage to the polymer. 

A polysaccharide is a polymer composed of monosaccharides linked to one another. 
In many polysaccharides the basic building block of the polysaccharide is actually a 
disaccharide unit which may be repeating or non-repeating. Thus, a unit when used with 
respect to a polysaccharide refers to a basic building block of a polysaccharide and may 
1 5 include a monomeric building block (monosaccharide) or a dimeric building block 
(disaccharide). 

A "plurality of chemical units" is at least two units linked to one another. 
The polymers may be native or naturally-occurring polymers which occur in nature or 
non-naturally occurring polymers which do not exist in nature. The polymers typically 

20 include at least a portion of a naturally occurring polymer. The polymers may be isolated or 
synthesized de novo. For example, the polymers may be isolated from natural sources e.g. 
purified, as by cleavage and gel separation or may be synthesized e.g.,(i) amplified in vitro by, 
for example, polymerase chain reaction (PCR); (ii) synthesized by, for example, chemical 
synthesis; (iii) recombinantly produced by cloning, etc. 

25 The invention is useful for identifying properties of polymers. A "property" as used 

herein is a characteristic (e.g., structural characteristic) of the polymer that provides 
information (e.g., structural information) about the polymer. When the term property is used 
with respect to any polymer except a polysaccharide the property provides information other 
than the identity of a unit of the polymer or the polymer itself. A compilation of several 

30 properties of a polymer may provide sufficient information to identify a chemical unit or even 
the entire polymer but the property of the polymer itself does not encompass the chemical 
basis of the chemical unit or polymer. 
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When the term property is used with respect to polysaccharides, to define a 
polysaccharide property, it has the same meaning as described above except that due to the 
complexity of the polysaccharide, a property may identify a type of monomeric building block 
of the polysaccharide. Chemical units of polysaccharides are much more complex than 
5 chemical units of other polymers, such as nucleic acids and polypeptides. The 

polysaccharide unit has more variables in addition to its basic chemical structure than other 
chemical units. For example, the polysaccharide may be acetylated or sulfated at several sites 
on the chemical unit, or it may be charged or uncharged. Thus, one property of a 
polysaccharide may be the identity of one or more basic building blocks of the 

1 0 polysaccharides. 

A basic building block alone, however, may not provide information about the charge 
and the nature of substituents of the saccharide or disaccharide. For example, a building block 
of uronic acid may be iduronic or glucuronic acid. Each of these building blocks may have 
additional substituents that add complexity to the structure of the chemical unit. A single 

15 property, however, may not identify such additional substitutes charges, etc., in addition to 
identifying a complete building block of a polysaccharide. This information, however, may 
be assembled from several properties. Thus, a property of a polymer as used herein does not 
encompass an amino acid or nucleotide but does encompass a saccharide or disaccharide 
building block of a polysaccharide. 

20 The type of property that will provide structural information about a polymer will 

depend on the type of polymer being analyzed. For instance, if the polymer is a 
polysaccharide a property such as charge, molecular weight, nature and degree of sulfation or 
acetylation, or type of saccharide will provide structural information about the polymer. If the 
polymer is a polypeptide then a property will provide information about charge, acidity, etc. 

25 Properties include but are not limited to charge, chirality, nature of substituents, quantity of 

substituents, molecular weight, molecular length, compositional ratios of substituents or units, 
type of basic building block of a polysaccharide, hydrophobicity, enzymatic sensitivity, 
hydrophilicity, secondary structure and conformation (i.e., position of helices), spatial 
distribution of substituents, ratio of one set of modifications to another set of modifications 

30 (i.e., relative amounts of 2-0 sulfation to N-sulfation or ratio of iduronic acid to glucuronic 
acid, and binding sites for proteins. Other properties will easily be identified by those of 
ordinary skill in the art. A substituent, as used herein is an atom or group of atoms that 
substitute a unit, but are not themselves the units. 
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The property of the polymer may be identified by any means known in the art. The 
procedure used to identify the property will depend on the type of property. Molecular 
weight, for instance, may be determined by several methods including mass spectrometry. 
The use of mass spectrometry for determining the molecular weight of polymers is well 
5 known in the art. Mass Spectrometry has been used as a powerful tool to characterize 

polymers because of its accuracy (dblDalton) in reporting the masses of fragments generated 
e.g. by enzymatic cleavage and also because only pM sample concentrations are required. For 
instance matrix-assisted laser desorption ionization mass spectrometry (MALDI-MS) has been 
described for identifying the molecular weight of polysaccharide fragments in publications 

10 such as Rhomberg, A. J. et al, PNAS, USA, v. 95, p. 4176-41 81 (1998); Rhomberg, A. J. et al, 
PNAS, USA, v. 95, p. 12232-12237 (1998); and Ernst, S. et. al., PNAS, USA, v. 95, p. 4182- 
4 1 87 (1 998), each of which is hereby incorporated by reference. Other types of mass 
spectrometry known in the art, such as, electron spray-MS, fast atom bombardment mass 
spectrometry (FAB-MS) and collision-activated dissociation mass spectrometry (CAD) may 

15 also be used to identify the molecular weight of the polymer or polymer fragments. 

The mass spectrometry data may be a valuable tool to ascertain information about the 
polymer fragment sizes after the polymer has undergone degradationjwith enzymes or 
chemicals. After a molecular weight of a polymer is identified, it may be compared to 
molecular weights of other known polymers. Because masses obtained from the mass 

20 spectrometry data are accurate to one Dalton (ID), a size of one or more polymer fragments 
obtained by enzymatic digestion may be precisely determined, and a number of substituents 
(i.e., sulfates and acetate groups present) may be determined. One technique for comparing 
molecular weights is to generate a mass line and compare the molecular weight of the 
unknown polymer to the mass line to determine a subpopulation of polymers which have the 

25 same molecular weight. A "mass line" as used herein is an information database, preferably 
in the form of a graph or chart which stores information for each possible type of polymer 
haying a unique sequence based on the molecular weight of the polymer. Thus, a mass line 
may describe a number of polymers having a particular molecular weight. A two-unit nucleic 
acid molecule (i.e., a nucleic acid having two chemical units) has 16 (4 units 2 ) possible 

30 polymers at a molecular weight corresponding to two nucleotides. A two-unit polysaccharide 
(i.e., disaccharide) has 32 possible polymers at a molecular weight corresponding to two 
saccharides. Thus, a mass line may be generated by uniquely assigning a particular mass to a 
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particular length of a given fragment (all possible di, tetra, hexa, octa, up to a 
hexadecasaccharide), and tabulating the results (An Example is shown in Figure 5). 

Table 1 below shows an example of a computed set of values for a polysaccharide. 
From Table 1, a number of chemical units of a polymer may be determined from the 
minimum difference in mass between a fragment of length n+1 and a fragment of length n. 
For example, if the repeat is a disaccharide unit, a fragment of length n has 2n 
monosaccharide units. For example, n=l may correspond to a length of a disaccharide and 
n=2 may correspond to a length of a tetrasaccharide, etc. 



Fragment Length n 


Minimum difference in mass 
between n+1 and n(D (Dalton)) 


1 


101.13 


2 


13.03 


3 


13.03 


4 


9.01 


5 


9.01 


6 


4.99 


7 


4.99 


8 


0.97 


9 


0.97 



Table 1 



Because mass spectrometry data indicates the mass of a fragment to 1 D accuracy, a 
length may be assigned uniquely to fragment by looking up a mass on the mass line. Further, 
it may be determined from the mass line that, within a fragment of particular length higher 
than a disaccharide, there is a minimum of 4.02D different in masses indicating that two 
acetate groups (84.08D) replaced a sulfate group (80.06D). Therefore, a number of sulfates 
and acetates of a polymer fragment may be determined from the mass from the mass 
spectrometry data and, such number may be assigned to the polymer fragment. 

In addition to molecular weight, other properties may be determined using methods 
known in the art. The compositional ratios of substituents or chemical units (quantity and 
type of total substituents or chemical units) may be determined using methodology known in 
the art, such as capillary electrophoresis. A polymer may be subjected to an experimental 
constraint such as enzymatic or chemical degradation to separate each of the chemical units of 
the polymers. These units then may be separated using capillary electrophoresis to determine 
the quantity and type of substituents or chemical units present in the polymer. Additionally, a 
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number of substituents or chemical units can be determined using calculations based on the 
molecular weight of the polymer. 

In the method of capillary gel-electrophoresis, reaction samples may be analyzed by 
small-diameter, gel-filled capillaries. The small diameter of the capillaries (50 |jm) allows for 
5 efficient dissipation of heat generated during electrophoresis. Thus, high field strengths can 
be used without excessive Joule heating (400 V/m), lowering the separation time to about 20 
minutes per reaction run, therefor increasing resolution over conventional gel electrophoresis. 
Additionally, many capillaries may be analyzed in parallel, allowing amplification of 
generated polymer information. 

10 In addition to being useful for identifying a property, compositional analysis also may 

be used to determine a presence and composition of an impurity as well as a main property of 
the polymer. Such determinations may be accomplished if the impurity does not contain an 
identical composition as the polymer. To determine whether an impurity is present may 
involve accurately integrating an area under each peak that appears in the electrophoretogram 

1 5 and normalizing the peaks to the smallest of the major peaks. The sum of the normalized 
peaks should be equal to one or close to being equal to one. If it is not, then one or more 
impurities are present. Impurities even may be detected in unknown samples if at least one of 
the disaccharide units of the impurity differs from any disaccharide unit of the unknown. 

If an impurity is present, one or more aspects of a composition of the components may 

20 be determined using capillary electrophoresis. Because all known disaccharide units may be 
baseline-separated by the capillary electrophoresis method described above and because 
migration times typically are determined using electrophoresis (i.e., as opposed to 
electroosmotic flow) and are reproducible, reliable assignment to a polymer fragment of the 
various saccharide units may be achieved. Consequently, both a composition of the major 

25 peak and a composition of a minor contaminant may be assigned to a polymer fragment. The 
composition for both the major and minor components of a solution may be assigned as 
described below. 

One example of such assignment of compositions involves determining the 
composition of the major AT-III binding HLGAG decasaccharide ( + DDD4-7) and its minor 
30 contaminant (+ D5D4-7) present in solution in a 9:1 ratio. Complete digestion of this 9:1 
mixture with a heparinases yields 4 peaks: three representative of the major decasaccharide 
(viz., D, 4, and -7) which are also present in the contaminant and one peak, 5, that is present 
only in the contaminant. In other words, the area of each peak for D, 4, and -7 represents an 
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additive combination of a contribution from the major decasaccharide and the contribution 
from the contaminant, whereas the peak for 5 represents only the contaminant. 

To assign the composition of the contaminant and the major component, the area 
under the 5 peak may be used as a starting point. This area represents an area under the peak 
5 for one disaccharide unit of the contaminant. Subtracting this area from the total area of 4 and 
-7 and subtracted twice this area from an area under D yields a 1:1:3 ratio of 4:-7:D. Such a 
ratio confirms the composition of the major component and indicates that the composition of 
the impurity is two Ds, one 4, one -7 and one 5. 

Methods of identifying other types of properties may be easily identifiable to those of 

10 skill in the art and may depend on the type of property and the type of polymer. For example, 
hydrophobicity may be determined using reverse-phase high-pressure liquid chromatography 
(RP-HPLC). Enzymatic sensitivity may be identified by exposing the polymer to an enzyme 
and determining a number of fragments present after such exposure. The chirality may be 
determined using circular dichroism. Protein binding sites may be determined by mass 

1 5 spectrometry, isothermal calorimetry and NMR. Enzymatic modification (not degradation) 

may be determined in a similar manner as enzymatic degradation, i.e., by exposing a substrate 
to the enzyme and using MALDI-MS to determine if the substrate is modified. For example, 
a sulfotransferase may transfer a sulfate group to an HS chain having a concomitant increase 
in 80Da. Conformation may be determined by modeling and nuclear magnetic resonance 

20 (NMR). The relative amounts of sulfation may be determined by compositional analysis or 
approximately determined by Raman spectroscopy. 

In some aspects the invention is useful for generating, searching and manipulating 
information about polymers. In this aspect the complete building block of a polymer is 
assigned a unique numeric identifier, which may be used to classify the complete building 

25 block. For instance if a polysaccharide is being analyzed, each numeric identifier would 
represent a complete building block of a polysaccharide, including the exact chemical 
structure as defined by the basic building block of a polysaccharide and all of its substituents, 
charges etc. A basic building block refers to a basic structure of the polymer unit e.g., a basic 
ring structure of a polysaccharide, such as iduronic acid or glucuronic acid but does not 

30 include substituents, charges etc. The information is generated and processed in the same 
manner as described above with respect to "properties" of polymers. 

Currently, saccharide fragments are detected in capillary electrophoresis by 
monitoring at 232 nm, the wavelength at which the A 4,5 double bond, generated upon 
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heparinase cleavage, absorbs. However, other detection methods are possible. First, nitrous 
acid cleavage of heparin fragments, followed by reduction with 3 H-sodium borohydride yields 
degraded fragments having a 3 H radioactive tag. This represents both a tag which may be 
followed by capillary electrophoresis (counting radioactivity) or mass spectrometry (by the 
5 increase in mass). Another method of using radioactivity would be to label the heparin 
fragment with S 35 . Similar to the types of detection possible for 3 H-labeled fragments, S 35 
labeled fragments may be useful for radioactive detection (CE) or measurement of mass 
differences (MS). 

Especially in the case of S 35 , this detection will be powerful. In this case, the human 

10 sulfotransferases may be used to label specifically a certain residue. This will give additional 
structural information. 

Nitrous acid degraded fragments, unlike heparinase-derived fragments, do not have a 
UV-absorbing chromophore. As we have shown, MALDI-MS will record the mass of heparin 
fragments regardless of how they are derived. For CE, two methods may be used to monitor 

1 5 fragments that lack a suitable chromophore. First is indirect detection of fragments. We may 
detect heparin fragments with our CE methodology using a suitable background absorber, 
e.g., 1,5-napthalenedisulfonic acid. The second method for detection involves chelation of 
metal ions by saccharides. The saccharide-metal complexes may be detected using UV-Vis 
just like monitoring the unsaturated double bond. 

20 Other groups have begun the process of raising antibodies to specific HLGAG 

sequences. We have previously shown that proteins, e.g., angiogenin, FGF, may be used as 
the complexing agent instead of a synthetic, basic peptide. By extension,' antibodies could be 
used as a complexing agent for MALDI-MS analysis. This enables us to determine whether 
specific sequences are present in an unknown sample simply by observing whether a given 

25 antibody with a given sequence specificity complexes with the unknown using MALDI-MS. 

The final point is that using mass tags, we may distinguish the reducing end of a 
glycosaminoglycan from the non-reducing end. All of these tags involve selective chemistry 
with the anomeric OH (present at the reducing end of the polymer), thus labeling occurs at the 
reducing end of the chain. One common tag is 2-aminobenzoic acid which is fluorescent. In 

30 general tags involve chemistry of the following types: (1) reaction of amines with the 

anomeric position to form imines (i.e., 2-aminobenzoic acid), hydrazine reaction to form 
hydrazones, and reaction of semicarbazones with the anomeric OH to form semicarbazides. 
Commonly used tags (other than 2-aminobenzoic acid) include the following compounds: 
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1 . semicarbazide 

2. Girard's P reagent 

3. Girard's T reagent 

4. p-aminobenzoic ethyl ester 
5 5. biotin-x-hydrazide 

6. 2-aminobenzamide 

7. 2-aminopyridine 

8. anthranilic acid 

9. 5-[(4,6-dichlorotriazine-2-yl)amino]-fluorescein 
10 10. 8-aminonaphthalene-l,3,6-trisulfonic acid 

1 1 . 2-aminoacridone 

Referring to FIG. 1, a system 100 for sequencing polymers is shown. The system 100 
includes a polymer database 102 which includes a plurality of records storing information 

1 5 corresponding to a plurality of polymers. Each of the records may store information about 
properties of the corresponding polymer, properties of the corresponding polymer's 
constituent chemical units, or both. The polymers for which information is stored in the 
polymer database 102 may be any kind of polymers. For example, the polymers may include 
polysaccharides, nucleic acids, or polypeptides. In one embodiment, each of the records in 

20 the polymer database 102 includes a polymer identifier (ID) that identifies the polymer 
corresponding to the record. The record also includes chemical unit identifiers (IDs) 
corresponding to chemical units that are constituents of the polymer corresponding to the 
record. Polymers may be represented in the polymer database in other ways. For example, 
records in the polymer database 102 may include only a polymer ID or may only include 

25 chemical unit IDs. 

The polymer database 102 may be any kind of storage medium capable of storing 
information about polymers as described herein. For example, the polymer database 102 may 
be a flat file, a relational database, a table in a database, an object or structure in a computer- 
readable volatile or non-volatile memory, or any data accessible to a computer program, such 

30 as data stored in a resource fork of an application program file on a computer-readable storage 
medium. 

In one embodiment, a polymer ID includes a plurality of fields for storing information 
about properties of the polymer corresponding to the record containing the polymer ID. 
Similarly, in one embodiment, chemical unit IDs include a plurality of fields for storing 
35 information about properties of the chemical unit corresponding to the chemical unit ID. 

Although the following description refers to the fields of chemical unit Ids, such description is 
equally applicable to the fields of polymer IDs. 
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The fields of chemical unit IDs may store any kind of value that is capable of being 
stored in a computer readable medium, such as a binary value, a hexadecimal value, an 
integral decimal value, or a floating point value. The fields may store information about any 
properties of the corresponding chemical unit. 
5 A compositional analyzer 108 receives as input a sample polymer 106 and generates as 

output polymer composition data 1 10 that is descriptive of the composition of the sample 
polymer. A compositional analyzer as used herein is any type of equipment or experimental 
procedure that may be used to identify a property of a polymer modified by an experiment 
constraint, such as those described above. These include, for instance, but are not limited to 

10 capillary electrophoresis, mass spectrometry, and chromatography. The polymer composition 
data 1 10 includes information about the sample polymer 106, such as the properties of the 
chemical units in the sample polymer 106 and the number of chemical units in the sample 
polymer 106. A sequencer 1 12 generates a candidate list 1 16 of a subpopulation of polymers 
that might match the sample polymer 106 in the process of sequencing the sample polymer 

15 106 using information contained in a mass line 114 and the polymer database 102. A 
candidate list is also referred to herein as a "population" of polymers. At the end of the 
sequencing process, the candidate list 1 16 contains zero or more polymers that correspond to 
the sample polymer 106. A subpopulation of polymers is defined as a set of polymers having 
at least two properties in common with a sample polymer. It is useful to identify 

20 subpopulations of polymers in order to have an information set with which to compare the 
sample polymer 106. 

Consider, for example, the sequence DD7DAD-7, which is a tetradecasaccharide (14 
mer) of HLGAG containing 20 sulfate groups. The compositional analyzer 108 may, for 
example, perform compositional analysis of DD7DAD-7 by degrading the sequence to its 

25 disaccharide building blocks and analyzing the relative abundance of each unit using capillary 
electrophoresis to generate the polymer composition data 1 10. The polymer composition data 
1 10 in this case would show a major peak corresponding to ±D, a peak about Vi the size of the 
major peak corresponding to ±7 and another peak about 1/4 the size of the major peak 
corresponding to ±A. Note that the ± sign is used because degradation by heparinase would 

30 create a double bond between the C4 and C5 atoms in the uronic acid ring thereby leading to 
the loss of the iduronic vs. glucuronic acid information. From the polymer composition data 
110, it may be inferred that there are 4 ±Ds, 2 ±7s and a ±A in the sequence. 
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Referring to FIG. 2, a process 200 that may be performed by the sequencer 1 12 to 
sequence the sample polymer 106 is shown. The sequencer 1 12 receives the polymer 
composition data 1 10 from the compositional analyzer 108. The sequencer 1 12 uses the 
polymer composition data 1 10 and the information contained in the polymer database 102 to 
5 generate an initial candidate list 1 16 of all possible polymers: (1) having the same length as 
the sample polymer 106 and (2) having the same constituent chemical units as the sample 
polymer 106 (step 204). 

For example, consider the sequence DD7DAD-7 mentioned above. The polymer 
composition data 1 10 indicates that the sequence includes 4 ±Ds, 2 ±7s and one ±A, and 

10 indicates that the length of the sample polymer 106 is seven. In this case, step 204 (generation 
of the candidate list 116) involves generating all possible sequences having the same length as 
the sample polymer 106 and having 4 ±Ds, 2 ±7s and a ±A. In one embodiment, the 
sequencer 1 12 uses a brute force method to generate all sequences having these characteristics 
by generating all sequences of length seven having 4 ±Ds, 2 ±7s and a ±A using standard 

1 5 combinatoric methods. 

The sequencer 1 12 then uses the data from the mass line 1 14 to progressively 
eliminate sequences from the list generated in step 204 until the number of sequences in the 
list reaches a predetermined threshold (e.g., one). To perform such elimination, in one 
embodiment, the sequencer 1 12 calculates the value of a predetermined property of each of 

20 the^polymers in the candidate list 116 (step 206). The predetermined property may, for 
example, be the mass of the polymer. An example method for calculating the mass of a 
polymer will be described in more detail below. The sequencer 1 12 compares the calculated 
values of the predetermined property of the polymers in the candidate list 1 16 to the value of 
the predetermined property of the sample polymer 106 (step 208). The sequencer 1 1 2 

25 eliminates candidate polymers from the candidate list 1 16 whose predetermined property 
values do not match the value of the predetermined property of the sample polymer 106 
within a predetermined range (step 208). For example, if the predetermined property is 
molecular weight, the predetermined range may be ±1 .5D. 

The sequencer 1 12 applies an experimental constraint to the sample polymer 1 06 to 

30 modify the sample polymer 106 (step 210). An "experimental constraint" as used herein is a 
biochemical process performed on a polymer which results in modification to the polymer 
which may be detected. Experimental constraints include but are not limited to enzymatic 
digestion, e.g., with an exoenzyme, an endoenzyme, a restriction endonuclease; chemical 
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digestion; chemical modification; interaction with a binding compound; chemical peeling (i.e., 
removal of a monosaccharide unit); and enzymatic modification, for instance sulfation at a 
particular position with a heparan sulfate sulfotransferases. 

The sequencer 1 12 measures properties of the modified sample polymer 106 (step 
5 212). The sequencer 1 12 eliminates from the candidate list 1 16 those candidate polymers 
having property values that do not match the property values of the experimental results 122 
(step 214). 

If the size of the candidate list 1 16 is less than a predetermined threshold (e.g., 1) (step 
216), then the sequencer 1 12 is done (step 218). The contents of the candidate list 1 16 at this 

10 time represent the results of the sequencing process. The candidate list 1 16 may contain zero 
or more polymers, depending upon the contents of the polymer database 102 and the value of 
the predetermined threshold. If the size of the candidate list 1 16 is not less than the 
predetermined threshold (step 216), steps 210-216 are repeated until the size of the candidate 
list 1 16 falls below the predetermined threshold. When the sequencer 1 12 is done (step 218), 

1 5 the sequencer 1 12 may, for example, display the candidate list 1 1 6 to the user on an output 
device such as a computer monitor. 

Referring to FIG. 3, in another embodiment, the sequencer 1 12 uses a genetic 
algorithm process 300 to generate the initial candidate list 1 16 and to modify the candidate list 
1 1 6 in order to arrive at a final candidate polymer that identifies the sequence of the sample 

20 polymer 1 06. The sequencer 1 12 generates a population of random sequences with the 

composition indicated by the polymer composition data 110 and having the same length as the 
sample polymer 106 (step 302). The sequencer 1 12 evaluates the fitness (score) of the 
polymers in the candidate list 1 16 using a scoring function based on the enzymatic 
degradation of enzyme ENZ (step 304). The genetic algorithm process 300 uses the fitness 

25 values to decide which of the sequences in the candidate list 1 16 can survive into the next 
generation and which of the sequences in the candidate list 1 16 has the highest chance of 
producing other sequences of equal or higher fitness by cross-over and mutation. The 
sequencer 1 12 then performs cross-over and mutation operations that select for fit sequences 
in the candidate list 1 16 into the next generation (step 306). If at least a predetermined 

30 number (e.g., three) of generations of the candidate list 1 16 include copies of the correct 

sequence with the maximum fitness (step 308), then the sequencer 1 12 is done sequencing. 
Otherwise, the sequencer 1 12 repeats steps 304-306 until the condition of step 308 is satisfied. 



-22- 

Cross-over and mutation operations are used by genetic algorithms to randomly sample the 
different regions of a search space. 

In one embodiment, steps 210 and 212 are automated (e.g., carried out by a computer). 
For example, after the initial candidate list 1 16 has been generated (step 208), the sequencer 
5 112 may divide the candidate list 1 16 into categories (the categories are preferably based on 
properties), such as hepl cleavable, heplll cleavable, and nitrous acid cleavable (the property 
is enzymatic sensitivity). The sequencer 1 12 may then simulate the corresponding 
degradation or modification of the sequences present in each of the categories and search for 
those sequences that give fragments of unique masses. Based on the population of sequences 

10 that can give fragments of unique masses upon degradation or modification, the sequencer 
1 12 chooses the particular enzyme or chemical as the experimental constraint to eliminate 
candidate polymers from the candidate list 1 16 (step x). Although in this example only hepl, 
heplll, and nitrous acid are used, other experimental constraints such as enzymes may be used 
including the exoenzymes and other HLGAG degrading chemicals. 

1 5 In another embodiment, the sequencer 1 12 uses a chemical characteristic to guide the 

choice of experimental constraint. For example, normalized frequencies of chemical units of 
known polymers containing l2s ? G, Hns> and H^ac may be calculated. For example, the 
normalized frequency f(l2s) of chemical units containing ^s may be calculated as f(bs) = 
(number of disaccharide units containing 12s) / (number of disaccharide units). An example 

20 set of normalized frequencies calculated for known sequences in this way is shown in table 2 
below. 



Sequence 


f(l 2S ) 


f(G) 


f(H N S) 


f(HNAc) 


Constraints used for 
convergence 


Octa2 DDD-5 


0.75 


0.25 


1 


0 


Hep I and Hep III 
degradation 


FGF binding 
DDDDD 


1 


0 


1 


0 


Hep I normal and 
exhaustive degradation 


ATIII binding 
DDD4-7 


0.6 


0.2 


0.8 


0.2 


Hep I, Hep II and nitrous 
acid degradation 



TABLE 2 

The "constraints used for convergence" column indicates constraints that have been 
25 shown empirically to achieve convergence for the corresponding known sequence. Once 
compositional analysis has been performed on a sample (unknown) polymer, the relative 
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frequencies of hs, G, Hns> and Hnac in the sample sequence may be compared to the relative 
frequencies of the known sequences using the table above. To select a set of experimental 
constraints to apply to the sample polymer, the relative frequencies of the sample polymer 
may be compared to the relative frequencies of the known sequences in the table above. A 
5 known sequence with relative frequencies that are similar to the relative frequencies of the 
sample polymer may then be selected, and the experimental constraints identified with the 
selected sequence (as shown in the table) may then be applied to the sample polymer. 

For example, Table 2 demonstrates that the presence of f(G) and f(HwAc) are important 
factors in the decision to use heplll and nitrous acid, because nitrous acid clips after a Hns, 

10 and heplll clips after a disaccharide unit containing G. The disaccharide unit 12s-Hns,6S is the 
dominant unit in heparin-like regions (i.e., highly-sulfated regions) of the HLGAG chains. 
Therefore, if a sequence is more heparin-like, then hepl may be chosen as the default enzyme 
and the information content present in chemical units containing G and Hnac become 
important for choosing enzymes and chemicals other than hepl. Similarly, for low-sulfated 

1 5 regions on HLGAG chains, hepJII may be a default enzyme and f(hs) and f(H NS ) become 

important for choosing hepl and nitrous acid. Similarly, one may also calculate the positional 
sulfate or acetate distribution along the chain and generate the criterion for using the 
sulfotransferases or sulfateases for convergence. 

In one embodiment, the polymer database 102 stores the mass of each polymer in the 

20 polymer database 102. In this embodiment, step 206 (described above) may be performed 
merely by retrieving the mass of the corresponding polymer from the polymer database 102. 
In another embodiment, the polymer database 102 includes information indicating a mass of a 
baseline polymer. For example, in one embodiment the polymer database 102 stores 
information about disaccharides. Referring to Table 3, which illustrates one use of a binary 

25 notational representation system to notate disaccharides, it may be seen that the mass of the I- 
H N ac disaccharide unit is 379. 33D. 



I/G 


2X 


6X 


3X 


NX 


ALPH 
CODE 


DISACC 


MASS 
(AU) 


0 


0 


0 


0 


0 


0 


1 ~H NAc 


379.33 


0 


0 


0 


0 


1 


1 


1 - H NS 


417 . 35 


0 


0 


0 


1 


0 


2 


I~Hna C/ 3S 


459.39 


0 


0 


0 


1 


1 


3 


I~H N s, 3S 


497 . 41 
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0 


0 


1 


0 


0 


4 


I~H NAC/ 6S 


459.39 


0 


0 


1 


0 


1 


5 




4 97.41 


0 


0 


1 


1 


0 


6 


I- 


539.45 


0 


0 


1 


1 


1 


7 


I- 


577 . 47 


0 


1 


0 


0 


0 


8 


loc — ilMAr- 


459 . 39 


0 


1 


0 


0 


1 


9 




497 . 41 


o 


1 


o 


1 


o 


A 


- 1 - zs 

^NAc 3S 


539.45 


0 


1 


0 


1 


1 


B 


I2S~Hns, 3S 


577 .47 


o 


1 


1 


o 


o 


c 


1 2S 

HnAc, 6S 


539.45 


o 


1 


1 


0 


1 


D 


1 OO — H ki c CC 


577 . 47 


o 


1 


1 


1 


0 


E 


Too - 

* l lMftC ^ -JO/ Do 


619.51 


0 


1 


1 


1 


1 


F 


Hkjs 3^ 6S 


657 . 53 


1 


0 


0 


0 


0 


-0 


G~ Hnac 


379.33 


1 


0 


0 


0 


1 


-1 


G — Hns 


417 .35 


1 


0 


0 


1 


0 


-2 


G-H NAC/ 3s 


459.39 


1 


0 


0 


1 


1 


-3 


G — Hns 3s 


497 .41 


1 


0 


1 


0 


0 


-4 


G-H N ac, 6S 


459 .39 


1 


0 


1 


0 


1 


-5 


VU — 11 M Q CO 


497 . 41 


1 


o 


1 


1 


o 


-6 


G- 

IN /AC- / JO, OO 


539 . 45 


1 


0 


1 


1 


1 


-7 


G- 

H^s, 3S, 6S 


577 .47 


1 


1 


0 


0 


0 


-8 


G2s~H NAc 


459.39 


1 


1 


0 


0 


1 


-9 


G2s~H NS 


497 .41 


1 


1 


0 


1 


0 


-A 


G2S - 
Hnac, 3S 


539.45 
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1 


1 


0 


1 


1 


-B 


G2S~H NS , 3S 


577 .47 


1 


1 


1 


0 


0 




hoc — 

HnAc, 6S 




1 


1 


1 


0 


1 


-D 




577 . 47 


1 


1 


1 


1 


0 


-E 


G2S - 

HnAc, 3S, 6S 


619.51 


1 


1 


1 


1 


1 


-F 


G2S - 
HnS, 3S, 6S 


657 . 53 



TABLE 3 

In addition to the hexadecimal codes used in table 1 the following extra symbols were 
used to represent modifications in the disaccharide building block: 5-membered 
5 anhydromannitol ring - '; uronic acid with a C4-C5 unsaturated linkage - ±; reducing end 

disaccharide unit with a mass tag - (superscript) t; disaccharide unit without the uronic acid - *. 

The polymer database 102 may include information indicating that sulfation at a 
position of a polymer contributes 80.06D to the mass of the polymer and that substitution of a 
sulfate for an acetate contributes an additional 38.02D to the mass of the polymer. Therefore, 
10 the mass M of any polymer in the polymer database 102 may be calculated using the 
following formula: 

M = 379.33 + [0 80.06 80.06 80.06 38.02] * C, 
where C is the vector containing the binary representation of the polymer and * is a vector 
multiplication operator. For example, the mass of the disaccharide unit I2s-Hns,6s ? having a 
15 binary representation of 01 101, would be equal to 379.33 + [0 80.06 80.06 80.06 38.02] * 
[01101] = 379.33 + 0 + 80.06* 1 + 80.06* 1 + 80.06*0 + 38.02* 1 = 577.47D. 

Although the invention encompasses all polymers, the use of the invention is described 
in more detail with respect to polysaccharides because of the complex nature of 
polysaccharides. The invention, however, is not limited to polysaccharides. The 
20 heterogeneity of the heparin-like-glycosaminoglycan (HLGAG) fragments and the high 
degree of variability in their saccharide building blocks have hindered the attempts to 
sequence these complex molecules. Heparin-like-glycosaminoglycans (HLGAGs) which 
include heparin and heparan sulfate are complex polysaccharide molecules made up of 
disaccharide repeat units of hexoseamine and glucuronic/iduronic acid that are linked by oc/|3 
25 1 -4 glycosidic linkages. These defining units may be modified by sulfation at the N, 3-0 and 
6-0 position of the hexoseamine, 2-0 sulfation of the uronic acid and C5 epimerization that 
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converts the glucuronic acid to iduronic acid. Schematically the disaccharide unit of HLGAG 
may be represented as 

(otl— >4) I / G 20X (o/p 1 -»4) H 3ox, ny 6 ° X (a 1 ->4) 

where 

i 

5 X may be sulfated (-S03H) or unsulfated (-H) 

Y may be sulfated (-S03H) or acetylated (-COCH3) 

HLGAGs may be represented using a notational system in which an HLGAG is 
represented by a polymer ID (described above). The fields of the polymer ID may store any 
kinds of values, such as single-bit values, single-digit hexadecimal values, or decimal values. 

10 In one embodiment, the polymer ID representing an HLGAG includes each of the following 
fields: (1) a field for storing a value indicating whether the polymer contains an iduronic or a 
glucuronic acid (I/G); (2) a field for storing a value indicating whether the 2X position of the 
iduronic or glucuronic acid is sulfated or unsulfated; (3) a field for storing a value indicating 
whether the hexoseamine is sulfated or unsulfated; (4) a field indicating whether the 3X 

1 5 position of the hexoseamine is sulfated or unsulfated; and (5) a field indicating whether the 
NX position of the hexoseamine is sulfated or acetylated. 

In one embodiment, each of the fields is represented as a single bit. An example of the 
use of this scheme to encode HLGAGs is shown in Table 1. Bit values for each of the fields 
may be assigned in any manner. For example, with respect to the I/G field, in one 

20 embodiment a value of one indicates Iduronic and a value of zero indicates Glucuronic, while 
in another embodiment a value of one indicates Glucuronic and a value of zero indicates 
Iduronic. 

In one embodiment, the four fields (2X, 6X, 3X, and NX) is represented as a single 
hexadecimal (base 16) number where each of the four fields represents one of the bits of the 

25 hexadecimal number. Using hexadecimal numbers to represent disaccharide units is 

convenient both for representation and processing because hexadecimal digits are a common 
form of representation used by conventional computers. In a further embodiment, the five 
fields (I/G, 2X, 6X, 3X, NX) are represented as a signed hexadecimal digit, in which the four 
fields (2X, 6X, 3X, NX) are used to code a single-digit hexadecimal number as described 

30 above and the 1/G field is used as a sign bit. In this embodiment, the hexadecimal numbers 0- 
F may be used to code units containing iduronic acid and the hexadecimal numbers -0 to -F 
may be used to code units containing glucuronic acid. The polymer unit ID may, however, be 
encoded in other ways, such as by using a twos-complement representation. 
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HLGAG fragments may be degraded using enzymes such as heparin lyase enzymes or 
nitrous acid and they may also be modified using different enzymes that transfer sulfate 
groups to the positions mentioned earlier or remove the sulfate groups from those positions. 
The modifying enzymes are exolytic and non-processive which means that they just act once 
5 on the non reducing end and will let go of the heparin chain without sequentially modifying 
the rest of the chain. For each of the modifiable positions in the disaccharide unit there exits a 
modifying enzyme. An enzyme that adds a sulfate group is called a sulfotransferase and an 
enzyme that removes a sulfate group is called a sulfatase. The modifying enzymes include 2- 
O sulfatase/ sulfotransferase, 3-0 sulfatase/sulfotransferase, 6-0 sulfatase/sulfotransferase and 

10 N-deacetylase-N-sulfotransferase. The function of these enzymes is evident from their names, 
for example a 2-0 sulfotransferase transfers a sulfate group to the 2-0 position of an iduronic 
acid (2-0 sulfated glucuronic acid is a rare occurrence in the HLGAG chains) and a 2-0 
sulfatase removes the sulfate group from the 2-0 position of an iduronic acid. 

HLGAG degrading enzymes include heparinase-I, heparinase- II , heparinase-III, D- 

15 glucuronidase and L-iduronidase. The heparinases cleave at the glycosidic linkage before a 
uronic acid. Heparinase I clips at a glycosidic linkage before a 2 -O sulfated iduronic acid. 
Heparinase -III cleaves at a glycosidic linkage before an unsulfated glucuronic acid. 
Heparinase -D cleaves at both Hep-I and Hep-III cleavable sites. After cleavage by the 
heparinases the uronic acid before which the cleavage occurs loses the information of iduronic 

20 vs. glucuronic acid because a double bond is created between the C4 and C5 atoms of the 
uronic acid. 

Glucuronidase and iduronidase, as their name suggests cleave at the glycosidic linkage 
after a glucuronic acid and iduronic acid respectively. Nitrous acid clips randomly at 
glycosidic linkages after a N-sulfated hexosamine and converts the six membered hexosamine 
25 ring to a 5 membered anhydromannitol ring. 

The above rules for the enzymes may easily be encoded into a computer as described 
above using binary arithmetic so that the activity of an enzyme on a sequence may be carried 
out using simple binary operators to give the fragments that would be formed from the 
enzymatic activity. 

30 These techniques may be used to construct a database of polysaccharide sequences. 

In some aspects the invention is a database of polysaccharide sequences, as well as, motif 
search and sequence alignment algorithms for obtaining valuable information about the nature 
of polysaccharide-protein interactions that are vital for the biological functioning of these 



-28- 

molecules. The sequence information in the database of polysaccharide sequences may also 
be used to provide valuable insight into sequence-structure relationships of these molecules. 

In addition to the use of the methods of the invention for sequencing polymers, the 
methods may be used for any purpose in which it is desirable to identify structural properties 
5 related to a polymer. For instance the methods of the invention may be used for analysis of 
low molecular weight heparin. By limited digestion of LMWH and analysis by CE and 
MALDI-MS, we may obtain an "digest spectrum" of various preparations of LMWH, thus 
deriving information about the composition and variations thereof. Such information is of 
value in terms of quality control for LMWH preparations. 

1 0 The methods are also useful for understanding the role of HLGAGs in fundamental 

biological processes. Already MS has been used to look at the presence of various proteins as 
a function of time in Drosophila development. In a similar fashion HLGAG expression can 
be as a function both of position and of time in Drosophila development. Similarly the 
methods may be used as a diagnostic tool for human diseases. There is a group of human 

15 diseases called mucopolysaccharidosis (MPS). The molecular basis for these diseases is 

mostly in the degradation pathway for HLGAGs. For instance, mucopolysaccharidosis type I 
involves a defect in iduronidase, which clips unsulfated iduronate residues from HLGAG 
chains. Similarly, persons suffering from mucopolysaccharidosis type II (MPS II) lack 
iduronate-2-sulfatase. In each of these disorders, marked changes in the composition and 

20 sequence of cell surface HLGAGs occurs. Our methodology could be used as a diagnostic for 
these disorders to identify which MPS syndrome a patient is suffering from. 

Additionally the methods of the invention are useful for mapping protein binding 
HLGAG sequences. Analogous to fingerprinting DNA, the MALDI-MS sequencing approach 
may be used to specifically map HLGAG sequences that bind to selected proteins. This is 

25 achieved by sequencing the HLGAG chain in the presence of a target protein as well as in the 
absence of the particular protein. In this manner, sequences protected from digestion are 
indicative of sequences that bind with high affinity to the target protein. 

The methods of the invention may be used to analyze branched or unbranched 
polymers. Analysis of branched polymers is more difficult than analysis of unbranched 

30 polymers because branched carbohydrates, are "information dense" molecules. Branched 

polysaccharides include a few building blocks that can be combined in several different ways, 
thereby, coding for many sequences. For instance, a trisaccharide, in theory, can give rise to 
over 6 million different sequences. The methods for analyzing branched polysaccharides, in 
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particular, are advanced by the creation of an efficient nomenclature that is amenable to 
computational manipulation. Thus, an efficient nomenclature for branched sugars that is 
amenable to computational manipulation has been developed according to the invention. Two 
types of numerical schemes that may encode the sequence information of these 
5 polysaccharides has been developed in order to bridge the widely used graphic (pictorial) 
representation and the proposed numerical scheme discussed below. 

a. Byte-based (Binary-scheme) notation scheme: The first notation scheme is based on a 
binary numerical system. The binary representation in conjunction with a tree-traversing 
algorithm is used to represent all the possible combinations of the branched polysaccharides. 

10 The nodes (branch points) are easily amenable to computational searching through tree- 
traversing algorithms (Figure 4A). Figure 4A shows a notation scheme for branched sugars. 
Each monosaccharide unit can be represented as a node (N) in a tree. The building blocks can 
be defined as either (A), (B), or (C) where Nl, N2, N3, and N4 are individual 
monosaccharides. Each of these combinations can be coded numerically to represent building 

15 blocks of information. By defining glycosylation patterns in this way, there are several tree 
traversal and searching algorithms in computer science that may be applied to solve this 
problem. 

A simpler version of this notational scheme is shown in Figure 4B. This simplified 
version may be extended to include all other possible modifications including unusual 

20 structures. For examples, an N-linked glycosylation in vertebrates contains a core region (the 
tri-mannosyl chitobiose moiety), and up to four branched chains from the core. In addition to 
the branched chains the notation scheme also includes other modification (such as addition of 
fucose to the core, or fucosylation of the GlcNac in the branches or sialic acid on the 
branches). Thus, the superfamily of N-linked polysaccharides can be broadly represented by 

25 three modular units: a) core region: regular, fucosylated and/or bisected with a GlcNac, b) 
number of branches: up to four branched chains, each with GlcNac, Gal and Neu., and c) 
modifications of the branch sugars. These modular units may be systematically combined to 
generate all possible combinations of the polysaccharide. Representation of the branches and 
the sequences within the branches can be performed as a n-bit binary code (0 and 1) where n 

30 is the number of monosaccharides in the branch. Figure 4C depicts a binary code containing 
the entire information regarding the branch. Since there are up to four branches possible, each 
branch can be represented by a 3-bit binary code, giving a total of 12 binary bits. The first bit 
represents the presence (binary 1) or absence (binary 0) of the GlcNac residue adjoining the 
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mannose. The second and the third bit similarly represent the presence or absence of the Gal 
and the Neu residues in the branch. Hence a complete chain containing GIcNac-Gal-Neu is 
represented as binary (111) which is equivalent to decimal 7. Four of the branches can then be 
represented by a 4 bit decimal code, the 1 st bit of the decimal code for the first branch and the 
5 2 nd , the second branch etc (right). 

This simple binary code does not contain the information regarding the linkage (a vs. p 
and the 1-6 or 1-3 etc.) to the core. This type of notation scheme, however, may be easily 
expanded to include additional bits for branch modification. For instance, the presence of a 2- 
6 branched neuraminic acid to the GlcNac in the branch can be encoded by a binary bit. 

10 b. Prime Decimal Notation Scheme: Similar to the binary notation described above, a 

second computationally friendly numerical system, which involves the use of a prime number 
scheme, has been developed. The algebra of prime numbers is extensively used in areas of 
encoding, cryptography and computational data manipulations. The scheme is based on the 
theorem that for small numbers, there exists a uniquely-definable set of prime divisors. In this 

1 5 way, composition information may be rapidly and accurately analyzed. 

This scheme is illustrated by the following example. The prime numbers 2, 3, 5, 7, 11, 
13, 17, 19, and 23 are assigned to nine common building blocks of polysaccharides. The 
composition of a polysaccharide chain may then be represented as the product of the prime 
decimals that represent each of the building blocks. For illustration, GlcNac is assigned the 

20 number 3 and mannose the number 2. The core is represented in this scheme as 2x2x2x3x3 
=72 (3 mannose and 2 GlcNacs). This notation, therefore, relies on the mathematical 
principle that 72 can be ONLY expressed as the combination of three 2s and two 3s. The 
prime divisors are therefore unique and can encode the composition information. This 
becomes a problem when one gets to very large numbers but not an issue for the size of 

25 numbers we encounter in this analysis. From this number the mass of the polysaccharide 
chain can be determined. 

The power of the computational approaches of the notional scheme may be used to 
systematically develop an exhaustive list of all possible combinations of the polysaccharide 
sequences. For instance, an unconstrained combinatorial list of possible sequences of size m n , 

30 where m is the number of building blocks and n is the number of positions in the chain may be 
used. In Figure 4C, there are 256 different saccharide combinations that are theoretically 
possible (4 combinations for each branch and 4 branches = 4 4 ). 
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A mass line of the 256 different polysaccharide structures may be plotted. Then the 
rules of biosynthetic pathways may be used to further analyze the polysaccharide. In the 
example (shown in Figure 4B), it is known that the first step of the biosynthetic pathway is the 
addition of GlcNac at the 1-3 linked chain (branch 1). Thus, branch 1 should be present for 
5 any of the other branches to exist. Based on this rule the 256 possible combinations may be 
reduced using a factorial approach to conclude that the branch 2, 3, and 4 exist if and only if 
branch one is non-zero. Similar constraints can be incorporated at the notation level before 
generation of the master list of ensembles. With the notation scheme in place, experimental 
data can be generated (such as MALDI-MS or CE or chromatography) and those sequences 

10 that do not satisfy this data can be eliminated. An iterative procedure therefore enables a 
rapid convergence to a solution. 

To identify branching patterns, a combination of MALDI-MS and CE (or other 
techniques) may be used, as shown in the Examples. Elimination of the pendant arms of the 
branched polysaccharide may be achieved by the judicious use of exo and endoenzymes. All 

15 antennary groups may be removed, retaining only the GlcNAc moieties extending from the 
mannose core and forming an "extended" core. In this way, information about branching is 
retained, but separation and identification of glycoforms is made simpler. One methodology 
that could be employed to form extended cores for most polysaccharide structures is the 
following. Addition of sialidases, and fucosidases will remove capping and branching groups 

20 from the arms. Then application of endo-p-galactosidase will cleave the arms to the extended 
core. For more unusual structures, other exoglycosidases are available, for instance xylases 
and glucosidases. By addition of a cocktail of degradation enzymes, any polysaccharide motif 
may be reduced to its corresponding "extended" core. Identification of "extended" core 
structures will be made by mass spectral analysis. There are unique mass signatures associated 

25 with an extended core motif depending on the number of pendant arms (Figure 4D). Figure 
4D shows a massline of the "extended" core motifs generated upon exhaustive digest of 
glycan structures by the enzyme cocktail. Shown are the expected masses of mono-, di-, tri- 
and tetrantennary structures both with and without a fucose linked ocl->6 to the core GlcNAc 
moiety (from left to right). All of the "extended" core structures have a unique mass signature 

30 that is easily resolved by MALDI MS (from left to right). Quantification of the various 
glycan cores present may be completed by capillary electrophoresis, which has proven to be a 
highly rapid and sensitive means for quantifying polysaccharide structures. [Kakehi, K. and S. 
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Honda, Analysis of glycoproteins, glycopeptides and glycoprote in-derived polysaccharides by 
high-performance capillary electrophoresis. J Chromatogr A, 1996. 720(1-2): p. 377-93.] 

5 Examples 

Example 1: Identification of the number of fragments versus the fragment mass for Di, 
Tetra, and Hexasaccharide. 

The masses of all the possible disaccharide, tetrasaccharide and hexasaccharide 
fragments were calculated and are shown in the mass line shown in Figure 5. The X axis 
1 0 shows the different possible masses of the di, tetra and hexasaccharides and the Y axis shows 
the number of fragments that having that particular mass. Although there is a considerable 
overlap between the tetra and hexasaccharide the minimum difference in their masses is 
1 3.03D. Note that the Y axis has been broken to omit values between 1 7 and 40, to show all 
the bars clearly. 

15 

Example 2: Sequencing of an octasaccharide of HLGAG. 

Using hepl, hepll, hepIII, nitrous acid, and exoenzymes, such as 2-sulfatase and a- 
iduronidase, P-glucuronidase, n-deacetylase as experimental constraints and the computer 
algorithm described above, an octasaccharide (02), two decasaccharide (FGF binding and 
20 ATITI binding) and a hexasaccharide sequence of HLGAG were sequenced. 

7. Compositional Analysis of 02: 

Compositional analysis of 02 was completed by exhaustive digest of a 30 yM sample 
with heparinases I-11I and analysis by capillary electrophoresis (CE). Briefly, to 10 \xL of 
polysaccharide was added 200 nM of heparinases I-III in sodium phosphate buffer pH 7.0. 

25 The reaction was allowed to proceed at 30°C overnight. For CE analysis the sample was 
brought to 25 Naphthalene trisulfonic acid (2 \iM) was run as an internal standard. 
Assignments of AU2s-H NS ,6s and AU-H NS ,6s were made on the basis that they comigrated with 
known standards. The internal standard migrated between 4 and 6mins, the trisulfated 
disaccharide AU 2 s-H NS ,6s migrated between 6 and 8 mins and the disulfated disaccharide AU- 

30 H N s,6s migrated between 8 and 10 mins. Integration of the peaks indicated that the relative 
amounts of the two saccharides was 3:1. 

The CE data for 02 octasaccharide demonstrated that there is a major peak 
corresponding to the commonly occurring trisulfated disaccharide (AU2s-Hns,6s) and a small 
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peak that corresponds to a disulfated disaccharide (AU-Hns,6s)- The relative abundance of 
these disaccharide units obtained from the CE data shows that there are 3 Ds (±) and a 5 (±). 
The number of possible combination of sequences having these disaccharide units is 32. The 
possible combinations are shown in Table 4 below. 



Possible sequences: 








±DDD5 


±D5DD 


±5DDD 


±D-DD-5 


±5DD-D 


±D-DD-5 


±5DDD-D 


±D-5DD 


±D-D-D5 


±5D-DD 


±DD-D5 


±D5D-D 


±d-5d-d 


±dd5d 


±d5-dd 


±DD-5D 


±D-5-DD 


±D5-D-D 


±DD-5-D 


±D-5-D-D 


±D-D-D-5 


±5D-D-D 


±D-D5D 


±5-DDD 


±5-DD-D 


±D-D5D 


±5-D-DD 
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Heparinase III digest (iii) 


±DDD- 5 


±D 


±D 


±D-5 




Seq Fragments formed 


±DD5D 


±D 


±D 


±D5 




(1732) 


±DD-5D 


±D 


±D 


±D5 




+DDD-5 ±DDD ±5 



2. Digestion of 02 with heparinase I: 

Digestion of 02 was completed using both a short procedure and an exhaustive digest. 
"Short" digestion was defined as using 100 nM of heparinase 1 and a digestion time of 10 

10 minutes. "Exhaustive" digestion was defined as overnight digestion with 200 nM enzyme. 
All digests were completed at room temperature. In the case of 02, both digest conditions 
yield the same results. Short digestion with heparinase 1 yields a pentasulfated tetrasaccharide 
(no acetyl groups) of m/z 5300.1 (1074.6) and a disaccharide of m/z 4802.6 (577.1) 
corresponding to a trisulfated disaccharide. This profile did not change upon exhaustive 

15 digest of 02. 

Upon treatment with heparinase I, 02 is clipped to form fragments with m/z 4802.6 
and 5,300.1 . From the masses of these fragments it was possible to uniquely determine that 
m/z of 4802.6 corresponded to a trisulfated disaccharide and m/z of 5300.1 corresponded to a 
pentasulfated tetrasaccharide. Since the disaccharide composition of the sequence was known 
20 the only trisulfated disaccharide that may be formed is ± D and the possible pentasulfated 
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tetrasaccharides that may be formed are ± 5D, ± 5-D, ± D5 and ± D-5. After identification of 
the fragments, the next step was to arrange them to give the right sequence. Since this was a 
cumbersome job to be handled manually a computer simulation was used to progressively 
eliminate sequences from the master list that did not fit the experimental data. Using the rule 
5 that heparinase-I cleaves before and hs the heparinase-I digestion was simulated on the 

computer to generate the fragments for all the 32 sequences in the master list. From the list of 
fragments formed for each sequence, the computer was used to search for fragments that 
corresponded to the di and tetrasaccharide observed from the mass spectrometry data. The 
sequences that gave the fragments that fit the mass spec data of hep I are shown in Fig 5a. It 

10 may be observed from Fig 5a that all the sequences have 3 Ds which is consistent with the 
known rules for hepl digestion used to produce these fragments. It may also be observed that 
two arrangements give the same product profile namely having the +/- 5 (1- Hna c ,6s or G- 
Hns,6s) the reducing end and having +/- 5 at the second position from the non-reducing end. 
To resolve this issue a second experimental constraint, digestion with hepl II, was used. 

1 5 Table 4 provides a list of sequences that satisfy the product profiles of hepl and hepl II 

digests of the octasaccharide 02. (a) shows the sequences that gave the di and tetrasaccharide 
fragments as observed from the mass spectrometry data. The fragments listed below along 
with their masses are those generated by computer simulation of hepl digest, (b) sequences in 
(a) that give the hexasaccharide fragment observed in the mass spectrometry data after hepI.II 

20 digestion. The fragments along with their masses were generated by computer simulation of 
heplll digestion. 

3. Digestion of 02 with heparinase HI: 

Digestion of 02 with heparinase III yielded a nonasulfated hexasaccharide of m/z 
25 5958.7 (1731 .9) and an unobserved disulfated disaccharide (to conserve sulfates). Both short 
and exhaustive digests yielded the same profile. 

Heparinase III treatment of 02 resulted in a major fragment of m/z 5958.7 which was 
uniquely identified as a hexasaccharide with 9 sulfate groups. The only sequence that 
satisfied the product profile of heplll digestion was ± DDD-5 which is shown in Table 4. 
30 Table 4 shows that there should be a -5 (G-hna c ,6s) in the reducing end. This was consistent 
with the rule used for heplll digestion, i.e. heplll clips before a G. The masses shown in the 
table are integers. The masses used to search for the required fragments were accurate to two 
decimal places. 
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Thus it was possible to demonstrate the ability to converge to the final sequence 
starting from the list of all possible sequences by eliminating sequences that do not fit 
experimental data. Since the starting point was a list of all the possible sequences given the 
composition of a sequence it was not possible that any sequences were missed during the 
5 analysis. 

Example 3: Sequencing of a basic fibroblast growth factor (FGF-2) binding saccharide. 

MALDI-MS of a basic fibroblast growth factor (FGF-2) binding saccharide was 
performed to determine the mass and size of the saccharide as a complex with FGF-2 (G. 

10 Venkataraman et al., PNAS. 96, 1 892, (1999).;. Dimers of FGF-2 bound to the saccharide (S) 
yielding a species with a m/z of 37,009. By subtraction of FGF-2 molecular weight, the 
molecular mass of the saccharide was determined to be 2808, corresponding to a 
decasaccharide with 14 sulfates and an anhydromannitol at the reducing end. 
1 . Compositional Analysis: 

15 Compositional analysis and CE of FGF-2 binding saccharide were completed as 

described above. Compositional analysis of this sample resulted in two peaks corresponding 
to ±D (AU2sH N s,6s) and ±D' (AU 2 sMan 6 s) in the ratio 3:1. As this decasaccharide was derived 
by nitrous acid degradation of heparin, the uronic acid at the non-reducing end was not 
observed by CE (232 nm). Therefore, the non-reducing end residue was identified as 

20 +D (I2sHns,6s) by sequencing with exoenzymes. The number of possible sequences with this 
composition is 16 Table 5(i). Of the 16 sequences, those that could result in the observed 
fragments upon heparinase I digestion of the decasaccharide are shown in Table 5(ii). 



25 



INSERT TABLE 5 HERE 



30 2. Digestion with heparinase I and heparinase III: 

To resolve the isomeric state of the internal uronic acid +D vs. — D, exhaustive 
digestion of the saccharide with heparinase 1 and heparinase II I was performed. Heparinase 1 
exhaustive digestion of the saccharide results in only two species corresponding to a 
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trisulfated disaccharide(±D) and its anhydromannitol derivative, while heparinase III did not 
cleave the decasaccharide at all. 

Heparinase I digestion of the decasaccharide yielded a pentasulfated tetrasaccharide 
{m/z 5286.3) with an anhydromannitol at the reducing end and a trisulfated disaccharide of 
5 m/z 4804.6. Table 5 shows the convergence of the FGF binding decasaccharide sequence. 
Thus, it provides a list of sequences that satisfied the mass spectrometry product profiles of 
FGF-2 binding saccharide on treatment with hepl. Section (i) of Table 5 shows the master list 
of 16 sequences derived from compositional analysis and exoenzyme sequencing of the non- 
reducing end. The disaccharide unit at the non-reducing end was assigned to be a +D using 

10 exoenzymes and the anhydromannitol group at the reducing end is shown as c . The mass of 
the fragments resulting from digestion of decasaccharide with heparinase 1 are shown in (ii). 
Also shown in (ii) are those sequences from (i) that satisfy heparinase 1 digestion data. 
Section (iii) of Table 5shows the sequence of decasaccharide from (ii) that satisfies the data 
from exhaustive digestion using heparinase I. This product profile may be obtained only if 

1 5 there is a hepl cleavable site at every position in the decasaccharide which led us to converge 
to the final sequence DDDDD' shown in section iii of Table 5. The above taken together 
confirm the sequence of the FGF-2 binding decasaccharide sequence to be DDDDD' 
[(l2sH NS ,6s)4l2sMan 6 s] . 

20 Example 4: Sequencing of an AT-III binding saccharide. 

An AT-III binding saccharide was used as an example of the determination of a 
complex sequence. 

7. Compositional Analysis: 

Compositional analysis and CE were completed as described above. Compositional 
25 analysis of an AT-III binding saccharide indicated the presence of three building blocks, 

corresponding to AU 2 sH N s,6s (±D), AUH N ac,6s (±4) and AUH N s,3s,6S (±7) in the relative ratio of 
3:1:1 respectively. The shortest polysaccharide that may be formed with this composition 
corresponds to a decasaccharide, consistent with the MALDI-MS data. The total number of 
possible combinations of this tridecasulfated single acetylated decasaccharide sequences with 
30 the above disaccharide building blocks is 320 Table 6. 
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INSERT TABLE 6 HERE 



2. Digestion with heparinase I: 

10 Digestion of this decasaccharide with heparinase I resulted in four fragments. The 

major fragments include a decasulfated singly-acetylated octasaccharide (m/z 6419.7), a 
heptasulfated, singly acetylated hexasaccharide with m/z 5842.1, a hexasulfated 
tetrasaccharide with m/z of 5383.1 and a trisulfated disaccharide (m/z 4805.3). Also present is 
a contaminant (*), a pentasulfated tetrasaccharide. The sequence of AT-III binding 

15 decasaccharide has been reported to be D4-7DD, on the basis of NMR spectroscopy (Y.Toida 
et al., J. Biol. Chem. 271, 32040 (1996)). Such a sequence should show the appearance of a 
tagged D or DD residue at the reducing end. However, we have found all the different 
experiments used in the elucidation of the decasaccharide sequence to be consistent with each 
other in the appearance of a 4-7 tagged product and not a D (or a DD) product. Surprisingly, 

20 this saccharide did not contain an intact AT-III binding site, as proposed. Therefore, 
confirmation of the proposed sequence was sought through the use of integral glycan 
sequencing (IGS) methodology. The result of 1GS agreed with our analysis. A minor 
contaminant saccharide has also been found. Of the 320 possible sequences, only 52 
sequences satisfied heparinase I digestion data Table 6(i). The mass spectrum of the 

25 exhaustive digestion of the decasaccharide with heparinase I showed m/z values that 

corresponded to a trisulfated disaccharide and a octasulfated hexasaccharide, thereby further 
reducing the list of 52 sequences to 28 sequences Table 6(ii). 

3. Digestion with heparinase II: 

To further converge on the sequence, a 'mass-tag' was used at the reducing end of the 
30 saccharide (A m/z of 56.1 shown as 't'). This enabled the identification of the saccharide 
sequence close to and at the reducing end. Typical yields for the mass-tag labeling varied 
between 80-90% as determined by CE. Treatment of the semicarbazide tagged 
decasaccharide, with heparinase II resulted in the following products: m/z 5958.4 (nine 
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sulfated hexasaccharide), m/z 5897.7 (tagged heptasulfated, singly acetylated hexasaccharide), 
m/z 5380.1 (hexasulfated tetrasaccharide), m/z 5320.9 (tagged tetrasaul fated tetrasaccharide), 
m/z 5264.6 (tetrasulfated tetrasaccharide) and m/z 4805.0 (a trisulfated disaccharide). The m/z 
value of 5320.9 and 5897.7 corresponded to a tagged tetrasulfated tetrasaccharide and a 
5 tagged heptasulfated hexasaccharide, both containing the N-acetyl glucosamine residue. This 
result indicated that +/- 4 (I/GHna c> 6s) is present at the reducing or one unit from the reducing 
end, thereby limiting the number of possible sequences from 28 to 6 Table 6(iii). 
4. Digestion with nitrous acid: 

Partial nitrous acid digestion of the tagged as well as the untagged decasaccharide 
1 0 provided no additional constraints but confirmed the heparinase II data. Exhaustive nitrous 

acid digestion, however, gave only the reducing end tetrasaccharide (with and without the tag) 
as an undipped product. Exhaustive nitrous acid treatment of decasaccharide essentially 
gives one tetrasulfated single-acetylated anhydromannitol tetrasaccharide species (one tagged 
m/z 5241 .5 and one untagged m/z 5 1 86.5). This confirmed that +/-4 (I/GH N a c ,6s) is one unit 
15 away from the reducing end. Sequential use of exoenzymes uniquely resolved the isomeric 
state of the uronic acid as +4 and the reducing end disaccharide to be —7 consistent with 4-7 
being the key AT-III binding motif. Treatment of this tetrasaccharide with iduronidase (and 
not glucuronidase) resulted in a species of m/z 5007.8 corresponding to the removal of 
iduronate residue. Further treatment with exoenzymes only in the following order 
20 (glucosamine 6-0 sulfatase, hexosamidase and glucuronidase) resulted in the complete 
digestion of the trisaccharide. Table 6 shows the convergence of the AT-III binding 
decasaccharide sequence from 320 possible sequences to 52 to 28 to 6 to the final sequence. 
Thus, the sequence of the AT-III binding decasaccharide was deduced as +DDD4-7 

(AU2sH N s,6sI2sHnS,6sI2sHnS,6sIH NA c,6sGHnS,3S,6s)- 

25 

Example 5: Sequencing of a Hexasaccharidel of HLGAG. 

10 pM HI was treated with 2mM nitrous acid in 20 ihM HC1 at room temperature for 
20 minutes such that limited degradation occurred. After 20 minutes, a two-fold molar excess 
of (arg-gly)i 9 arg in saturated matrix solution was added. 1 pmol of saccharide was spotted and 
30 used for mass spectrometric study. All saccharides were detected as non-covalent complexes 
with (arg-gly)| 9 arg. Starting hexasaccharide was observed as was a tetrasaccharide and 
disaccharide. Also observed is uncomplexed peptide (not shown in figures). Hereafter two 
m/z values are reported. The first is the observed m/z value that corresponds to the saccharide 
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+ peptide. The second number in parentheses is the m/z of the saccharide alone obtained by 
subtracting the mass of the peptide. 

After 20 minutes, nitrous acid treatment of HI yielded starting material at m/z 5882.5 
(1655.8) which corresponded to a hexasaccharide with 8 sulfates and an anhydromannitol at 
5 the reducing end, a m/z 5304.1 (1077.3), which corresponded to a tetrasaccharide with the 
anhydromannitol at the reducing end and a m/z of 4726.2 (499.4) which corresponded to a 
disulfated disaccharide with the anhydromanitol at the reducing end. 

This sample was then subjected to exoenzyme analysis. Three exoenzymes were 
added — iduronate 2-0 sulfatase, iduronidase, and glucosamine 6-0 sulfatase. The nitrous 
10 acid sample was neutralized via addition of 1/5 volume of 200 mM sodium acetate 1 mg/mL 
BSA pH 6.0 after which the enzymes were added. Glucosamine 6-0 sulfatase was added after 
digestion with the first two enzymes was complete. Final enzyme concentrations were in the 
range of 20-40 milliunits/mL and digestion was carried out at 37°C for a minimum of two 
hours. 

15 Upon incubation with iduronate 2-0 sulfatase and iduronidase, the hexasaccharide and 

tetrasaccharide peaks were reduced in mass. The disaccharide was no longer detectable after 
incubation with the enzymes. The hexasaccharide gave a new species at m/z 5627.3 (1398.8) 
corresponding to loss of sulfate and iduronate. The tetrasaccharide yielded a species of m/z 
5049.3 (820.8) again corresponding to loss of sulfate at the 2-0 position and loss of iduronate. 

20 These data showed that all the disaccharide building blocks contained an 12S. 

Addition of glucosamine 6-0 sulfatase and incubation overnight at 37°C resulted in 
the production of two new species. One at m/z 5546.8 (1318.3) resulting from loss of sulfate 
at the 6 position on glucosamine and the other at m/z 5224.7 (996.2), again corresponding to a 
tetrasaccharide 6-0 sulfate. These data showed that except for the reducing end 

25 anhydromanitol containing disaccharide unit the other units contained HNS. The data 

indicated that the sequence is DDD\ indicating that this sequence was originally derived from 
nitrous acid degradation unlike the other sequences which were derived from degradation by 
the heparinases. 

30 Example 6: Sequencing of other complex polysaccharides 

The sequencing approach may be readily extended to other complex polysaccharides 
by developing appropriate experimental constraints. For example, the dermatan/chondroitin 
mucopolysaccharides (DCMP) consisting of a disaccharide repeat unit is amenable to a 
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hexadecimal coding system and MALDI-MS. Similar to what is observed for HLGAGs, there 
is unique signature associated with length and composition to a given mass in DCMP, For 
instance, the minimum difference between any disaccharide and any tetrasaccharide is 139.2 
Da, therefore, the length, the number of sulfates and acetates may be readily assigned for a 
5 given DCM polysaccharide up to an octa-decasaccharide. Similarly, in the case of polysialic 
acids (PSA), present mostly as homopolymers of 5-N-acetylneuraminic acid (NAN) or 5-N- 
glycolylneuraminic acid (NGN), the hexadecimal coding system may be easily extended to 
NAN/NGN to encode the variations in the functional groups and enabling a sequencing 
approach for PSA. 

10 /. Dermatan/chondroitin family of complex mucopolysaccharides 

DCMP are found in dense connective tissues such as bone and cartilage. The basic 
repeat unit of the dermatan/chondroitin mucopolysaccharides (DCM P) may be represented as 
- ((3 1 ->4) U2x-( a /P 1 ~^3) GalwAc, 4X, 6X-, where U is uronic acid, Gal NAc is a N-acetylated 
galactosamine. The uronic acid may be glucuronic acid (G) or iduronic acid (I) and sulfated at 

1 5 the 2-0 position and the galactosamine (GalNAc) may be sulfated in the 4-0 or the 6-0 

position, thereby resulting in 16 possible combinations or building blocks for DCMP. Like 
the heparinases that degrade HLGAGs, there are distinct chondoroitinases and other chemical 
methods available that clip at specific glycosidic linkages of DCMP and serve as experimental 
constraints. Furthermore, since DCMPs are acidic polysaccharides, the MALDI-MS 

20 techniques and methods used for HLGAGs may be readily extended to the DCMPs. 

PEN scheme and mass-identity relationships for DCMP: Shown in Table 7 are the 
property-encoded nomenclature (PEN) of the 16 possible building blocks of 
dermatan/chondroitin family of molecules. The sequencing approach enables one to establish 
important mass-identity relationships as well as master list of all possible DCMP sequences 

25 from disaccharides to dodecasaccharides. These are plotted as a mass line as shown in Figure 
5. As observed for HLGAGs, there is a unique signature associated with length and 
composition for a given mass. As described above the minimum difference between any 
disaccharide and any tetrasaccharide was found to be 101 Daltons for HLGAGs. Interestingly, 
in the case of DCMP the minimum difference between any disaccharide and any 

30 tetrasaccharide is 139.2 Da. Therefore, the length, the number of sulfates and acetates may be 
readily assigned for a given DCM polysaccharide up to an octa-decasaccharide. 



I/G 


2X 


6X 


4X 


ALPH 


DISACC 


MASS (AU) 
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CODE 






0 


0 


0 


0 


0 


I-Gal NAc 


379.33 


0 


0 


0 


1 


1 


I-Gal NACi 4s 


459.39 


0 


0 


1 


0 


2 


I~Gal NAc> 6s 


459.39 


0 


0 


1 


1 


3 


I-Gal NACf 4S/ 6g 


539.45 


0 


1 


0 


0 


4 


l2S~ Ga lNAc 


459. 39 


0 


1 


0 


1 


5 


I 2 s - GalN ACi 4s 


539.45 


0 


1 


1 


0 


6 


I 2S -Gal NAc# 6s 


539.45 


0 


1 


1 


1 


7 


I 2s ~Gal NAC/ 4s, 6S 


619.51 


1 


0 


0 


0 


-0 


G-Gal NAc 


379.33 


1 


0 


0 


1 


-1 


G-Gal NAc> 4S 


459.39 


1 


0 


1 


0 


-2 


G-Gal NACf 6S 


459. 39 


1 


0 


1 


1 


-3 


G-Gal NACi 4Sf 6S 


539.45 


1 


1 


0 


0 


-4 


G 2s -Gal NAc 


459. 39 


1 


1 


0 


1 


-5 


G 2s -Gal NAc , 4S 


539.45 


1 


1 


1 


0 


-6 


G 2s -Gal NAc , 6S 


539. 45 


1 


1 


1 


1 


-7 


G 2s -Gal NAc> 4Sf 6S 


619.51 



TABLE 7 

Table 7 shows the Property Encoding Numerical scheme used to code DCMPs. The 
first column codes for the isomeric state of the uronic acid (0 corresponding to iduronic and 1 
5 corresponding to glucuronic). The second column codes for the substitution at the 2-0 
position of the uronic acid (0-unsuIfated,l -sulfated) . Columns 3 and 4 code for the 
substitution at the 4 and 6 position of the galaetosamine. Column 5 shows the numeric code 
for the disaccharide unit, column 6 shows the disaccharide unit and column 7 shows the 
theoretical mass calculated for the disaccharide unit. 

1 0 Tools as experimental constraints: Similar to the heparinases that degrade HLGAGs 

there are chondroitinases that degrade chondroitin-like and dermatan-like regions of DCMP. 
The chondroitinases B, C, AC and ABC have distinct specificities with some overlap. For 
the most part the chondroitinases cover the entire range of linkages found in DCMP. There are 
several chondroitinases that have been isolated and cloned from different sources. In addition 

1 5 to the enzymes, there are a few well-established chemical methods that may be used to 
investigate DCMP. These include nitrous acid treatment. Thus there are adequate tools 
(enzymatic and chemical) which function as 'experimental constraints' to enable DCMP 
sequencing. Below we use two DCMP sequences to illustrate sequencing DCMP. 
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10 



15 



20 



25 



A. Serpin HCF-2 binding DCMP hexasaccharide): 

The minimum size DCMP binding to serpin HCF-2 was isolated and its composition 
was determined using elaborate methods which included anion exchange chromatography, 
paper electrophoresis and paper chromatography. The sequencing strategy through the 
integration of PEN and MS established the identity of this serpin HCF-2 binding saccharide to 
be a hexasaccharide with 6 sulfates and 3 acetates. The high degree of sulfation pointed to a 
dermatan-like saccharide. Since this saccharide was derived using partial N-deacetylation and 
nitrous acid treatment, it comprises a 5 membered anhydrotalitol ring at the reducing end. 
Composition analysis of the saccharide may be obtained by degradation using the 
chondroitinases. The composition shows the presence of AU2sGal N Ac,4s (±5) and AL^saTaUs 
(aTal - anhydrotalitol - ±5') in a 2: 1 ratio. This enabled the generation of a master list with 8 
possible sequences as shown in Table 8a. 2-sulfatase and iduronidase treatment of the 
hexasaccharide produced a shift in the mass spectrum corresponding to the loss of a sulfate 
and iduronate, thereby fixing the I 2 s at non-reducing end (Table 8b). In order to converge 
further, Chondroitinase B (which acts on iduronate residues in dermatan-like regions) was 
used and a single peak in the mass spectrum corresponding to a 2-sulfated disaccharide was 
observed. This led us to converge to the sequence +555' (l2s-Gal N Ac,4S-bs-GalNAc,4s-l2s- 
aTal 4 s). 

TABLE 8 



+ 555' 

+55-5' 

+5-55' 

+5-5-5' 

-555' 

-55-5' 

-5-55' 

-5-5-5' 



2-sulfatase 



Iduronidase 



+ 555' 
+55-5' 
+5-55' 
+5-5-5' 



Chondroitinase B 



Sequence 


Fragments 




formed 


+ 555' 


+5 ±5 +5' 



30 



(a) 



(b) 



B. Hypothetical: 

In this example a "hypothetical DCMP polysaccharide" which is more complex than 
35 the previous example is used. Assume that MS yields a result that is interpreted to be an 

octasaccharide with 8 sulfates and 4 acetates, and that the composition analysis points to three 
species corresponding to AU 2 sGal N Ac,4s (±5), AUGal N Ac,6s (±2) and AU2sGal N A C> 4s,6s (±7) in 
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10 



2:1:1 relative abundance. This enables one to generate a master-list, which would point to 96 
possible sequences (Table 9a). It is expected that the digestion of the saccharide sample with 
chondroitinase AC would result in two products with masses that would correspond to two 
tetrasulfated tetrasaccharide units and thereby reduce the master list to 4 possible sequences 
(Table 9b). Complete deamination using hydrazonolysis and nitrous acid treatment would 
result in 3 peaks, two corresponding to a disulfated disaccharide and the third corresponding 
to a trisulfated disaccharide. Treatment of the degraded products with 2-sulfatase and 
iduronidase (and not glucuronidase) should result in peaks that correspond to the loss of 
sulfate and iduronate residues. This would enable the identification of the isomeric state of 5 
and 7 thereby converging the master-list to one sequence ±55-27 (AU 2 s-GaI N Ac,4s-l2s- 

GalNAc,4S-G-GalNAc,6S-l2S-GalNAc,4S,6s)- 



TABLE 9 



15 



20 



25 



30 



35 



Master list of 
96 sequences 



(a) 



Chondroitinase AC 



Sequence 


Fragments 


Complete deamination 


±55-27 


±55 ±27 


nitrous acid treatment 


±55-2-7 


±55 ±2-7 


► 


±5-5-27 


±5-5 ±27 


2-sulfatase, iduronidase 


±5-5-2-7 


±5-5 ±2-7 





Sequence 


Fragments 


±55-27 


±5' +5' -2' +7' 



(b) 



It is important to reiterate that, similar to what was developed for HLGAG, distinct or 
additional 'convergence strategies or experimental constraints' may be used to arrive at the 
'unique' solution for DCMP. 

2. Polysialic Acid 

Polysialic acids are linear complex polysaccharides found as a highly regulated post- 
translational modification of the neural cell adhesion molecule in mammals that are present 
mostly as homopolymers of 5-N-acetylneuraminic acid (NAN) or 5-N-glycolylneuraminic 
acid (NGN). The monomeric units of NAN and NGN are linked by a 2-8 glycosidic linkages, 
and may be modified at the 4-0, 7-0, and 9-0 positions. The major modification is 
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acetylation. In addition, much rarer modifications including sulfation and lactonization occur 
at the 9-0 position. A deaminated form of neuraminic acid namely 5-deamino-3.5- 
dideoxyneuraminic acid (KDN) has also been discovered. The PEN-MS sequencing approach 
is extended to polysialic acids, and using NAN and NGN units we illustrate how this is 
5 achieved. 

PEN scheme and mass-identity relationships for PSA: PSA is comprised of two 
different monomeric repeats, with variations in the modification of each unit. The flexibility 
of the PEN enables easy adaptation to a monomeric repeat unit for PSA from the dimeric 
repeats for HLGAG and DCMP. The PEN scheme for PSA is shown in Table 10. The 

10 sequencing approach establishes important mass-identity relationships as well as master list of 
all the combinations of monomeric units for NAN and NGN. The mass-line for polymeric 
units of NAN and NGN are shown in Fig. 6A and B. Note that there is a considerable overlap 
in masses observed for the higher order oligomers of both NAN and NGN (Figure 6A and B). 
The minimum difference in the masses between a n 'mer and a n+1 'mer stabilizes at 3.01 Da 

15 for NAN and 13Da for NGN, as we go to tetra, penta and hexasaccharide, thereby providing a 
safe margin for detection of these fragments using MS. 

TABLE 10 



NAN/ 
NGN 


9X 


7X 


4X 


Code 


Saccharide unit 


Mass 


0 


0 


0 


0 


0 


NAN 


309.28 


0 


0 


0 


1 


1 


NAN 4Ac 


351.32 


0 


0 


1 


0 


2 


NAN 7Ac 


351.32 


0 


0 


1 


1 


3 


NAN 4A c,7Ac 


393.36 


0 


1 


0 


0 


4 


NAN 9Ac 


351.32 


0 


1 


0 


1 


5 


NAN 4ACt9Ac 


393.36 


0 


1 


1 


0 


6 


NAN 7Ac yAc 


393.36 


0 


1 


1 


1 


7 


NAN 4Ac7AcyAc 


435.40 


1 


0 


0 


0 


-0 


NGN 


325.27 


1 


0 


0 


1 


-1 


NGN 4Ac 


367.32 


1 


0 


1 


0 


-2 


NGN 7Ac 


367.32 


1 


0 


1 


1 


-3 


NGN 4Ac , 7Ac 


409.36 


1 


1 


0 


0 


-4 


NGN 9Ae 


367.32 
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1 


1 


0 


1 


-5 


NGN 4AC! 9Ac 


409.36 


1 


1 


1 


0 


-6 


NGN 7ACs « Mc 


409.36 


1 


1 


1 


1 


-7 


NGN 4 Ac7Ac,yAc 


451.40 



Shown in Table 10 is the Property Encoded Numerical scheme for PSA. Column 1 
codes for whether the monomeric unit is NAN or NGN. Columns 2,3 and 4 code for the 
5 variations in the 9, 7 and 4 positions respectively, where 1 corresponds to acetylated and 0 
corresponds to unacetylated. Column 5 shows the numeric code for the PSAs. -0 to -7 was 
used instead of 8-F. Assigning the numbers to code for the variability in acetylation and the 
sign would indicate if it is NAN/NGN. Column 6 lists the monosaccharide represented by the 
code in column 5. Column 7 lists the theoretical mass calculated for the monomeric units 

10 shown in column 6. 

The mass-line for the combinations of substituted/unsubstituted NAN containing 
monomeric units in PSA is shown in Figure 6A. The X-axis represents the calculated masses 
for monosaccharide to hexasaccharides. Shown in the Y axis is the number of fragments of a 
particular length and composition that exists for a given mass. The values 1 50-1 90 were 

15 omitted to improve the clarity of the other peaks. The minimum difference between any 
monosaccharide and any disaccharide is 165.2Da, between any di and any trisaccharide is 
39.03Da, between any tri and any tetrasaccharide is 39.03Da and 3.01Da for all higher order 
saccharides. 

The mass-line for the combinations of substituted/unsubstituted NGN monomeric units 
20 in PSA is shown in Figure 6B. The X-axis represents the calculated masses for 

monosaccharaide to hexasaccharide. Shown in the Y axis is the number of fragments of a 
particular length and composition that exist for a given mass. The values 150-190 were 
omitted to improve the clarity of the other peaks. The minimum difference between any 
monosaccharide and any disaccharide is 1 81 .2Da, between any di and any trisaccharide is 
25 55.03Da and 1 3Da for higher order saccharides. 

Tools as experimental constraints: There are several tools and detection methods 
available for studying PSAs. Based on the properties of the building blocks of PSA, this class 
of linear polysaccharides is amenable for MS. Methods of purifying PSA polymers and 
obtaining composition using HPLC, CE and mass spectrometry have very recently been 
30 established. Enzymatic tools from various sources have been used to study PSA extensively. 
Notably the bacterial exosalidase which cleave PSA polymers processively from the non- 
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reducing end and the bacteriophage derived endoneuramidase, which clips endolytically both 
the NAN and NGN containing PSA linear polysaccharides. In addition to these enzymes 
chemical methods such as hydrozonolysis followed by nitrous acid treatment and periodate 
oxidation followed by sodium borohydrate treatment may be used to as tools to degrade PSA 
5 polysaccharides into smaller polysaccharides. 

Example 7: Variation of experimental conditions resulting in alteration of enzymatic 
reactions and its effect on the methods of the invention. 

Secondary specificities of the heparinases have been observed, especially under 

10 exhaustive degradation conditions. As a part of ongoing investigations into the enzymology of 
heparinases, the relative rates of cleavage of 1 and G containing sites by heparinase I and III 
with defined substrates under different conditions have been measured. For instance 
heparinase III cleaves both at I and G containing linkages and not I 2 s [H. E. Conrad, Heparin 
Binding Proteins (Academic Press, San Diego, 1998).]. However, under the reaction 

15 conditions used in this study, there is a dramatic (8-10 fold) difference in the rates of 

cleavage, with 1-containing linkages being clipped more slowly than G-containing linkages 
(Figure 7A). Figure 7A shows cleavage by recombinant heparinase III of tetrasaccharides 
containing either G (o), l(o) or l 2 s 0>) linkages. Each reaction was followed by capillary 
electrophoresis. With these substrates, heparinase III does not cleave I 2 s-containing 

20 glycosidic linkages, and cleaves G-containing linkages roughly 10 times as fast as I- 
containing linkages. Under the "short 55 conditions of digest it is expected that only G- 
containing saccharides are cleaved to an appreciable extent. [Conditions for enzymatic digest 
of HLGAG oligosaccharides were set forth above, briefly, Digests were either designated as 
"short 55 or "exhaustive 55 . Short digests were completed with 50 nM enzyme for 1 0 minutes. 

25 Exhaustive digests were completed using 200 nM enzyme for either four hours or overnight. 
Partial nitrous acid cleavage was completed using a modification of published procedures. 
Briefly, to an aqueous solution of saccharide was added a 2x solution of sodium nitrite in HC1 
such that the concentration of nitrous acid was 2mM and HC1 was 20mM. The reaction was 
allowed to proceed at room temperature with quenching of aliquots at various time points via 

30 the addition of 1 \jlL of 200 mM sodium acetate 1 mg/mL BSA pH 6.0. Exhaustive nitrous 

acid was completed by reacting saccharide with 4 mM nitrous acid in HC1 overnight at room 
temperature. In both cases, it was found that the products of nitrous acid cleavage could be 
sampled directly by MALDI without further cleanup and without the need to reduce the 
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anhydromannose residues to anhydromannitol. The entire panel of HLGAG degrading 
exoenzymes were purchased from Oxford Glycosystems (Wakefield, MA) and used as 
suggested by the manufacturer.] For example, with the hexasaccharide AUH NHj 6sGHnsIH N ac ? 
(which contains both I and G in a minimally sulfated region) cleavage occurs only at the G 
5 under "short" digest conditions as shown in Table II. 



Table II 



Species 


m/z (+ Peptide) 


Observed 


AUHNH,6sGH N sIHNac 


5442.1 


V 


AUHnsIHnac 


5023.6 


V 


AUHnh,6sGHns 


5061.7 





Heparinase II was incubated with the hexasaccharide AUH N H,6sGH NS lH N ac and only 
cleavage at the G and not the I was observed. Furthermore, we have found that degree of 

1 0 sulfation does affect the kinetics of heparinase III degradation of oligosaccharides [S. Ernst et 
al., Crit. Rev. Biochem. Mol. Biol. 30, 387 (1995); S. Yamada et al., Glycobiology 4, 69 
(1994); U.R. Desai, H.M. Wang, RJ. Linhardt, Biochemistry 32, 8140 (1993); R.J. Linhardt et 
al., Biochemistry 29, 261 1 (1990).]. In the case of heparinase I, this enzyme does not clip 
either I or G-containing glycosidic linkages within the context of our experimental 

1 5 procedures, whereas it readily clips hs containing polysaccharides (Figure 7B). Figure 7C 
shows the same study as completed in (A) except heparinase 1 was used instead of heparinase 
III. With heparinase I, cleavage only occurs at hs-containing linkages but not before I or G. 
There is only one report of heparinase I clipping G 2 s containing linkages [S. Yamada, T. 
Murakami, H. Tsuda, K. Yoshida, K. Sugahara, J. Biol. Chem. 270, 8696 (1995).], which was 

20 tested with two tetrasaccharide substrates and the experiments were performed under 
conditions which are kinetically very different from the 'short' heparinase I digestion 
presented here. 

Quite a few factors have severely limited and complicated prior art studies and 
interpretation of heparinase substrate specificity experiments. First, not only is a homogenous 
25 substrate preparation difficult, but also analyzing the substrates and products have been very 
challenging. Analysis has primarily relied on co-migration of the saccharides with known 
standards, and as others and we have observed, oligosaccharides with different sulfation 
patterns do co-migrate, complicating unique assignments. Further, some oligosaccharides 
used in previous studies to assign substrate specificity for the heparinases were not 
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homogeneous, complicating analysis. The development of the MALDI-MS procedure of the 
invention has enabled rapid and accurate determination of the saccharides. The second 
problem is the preparation of pure wild-type heparinases from the native host. The wild-type 
heparinase is isolated from Flavobacterium heparinum and this organism produces several 
5 complex polysaccharide-degrading enzymes, and often these copurify with each other. For 
example, when examining the kinetics of heparinase III, we found that a commercial source of 
heparinase III was able to degrade the supposedly non-cleavable AU2sH NS> 6shsHNs,6s- 
Furthermore, M S and CE analysis of the products indicated that one was specifically 2-0 
desulfated suggesting a sulfatase contamination. Recombinant heparinase III produced and 

10 purified in our laboratory (and not having contamination with other heparin degrading 
enzymes) does not cleave AU 2 sH N s,6sI2sHns,6s as expected. Thus, different enzyme 
preparations and differences in digestion conditions, and differences in substrate size and 
composition and often contaminating substrates, taken together with assignments based on co- 
elution make comparison of data not only very difficult but also has led to contradictory 

15 findings. 

Regardless of the outcome of heparinase substrate specificities, there are other 
methods that may be used to extract the isomeric state of the uronic acid [I or G or I 2 s or G 2 s]. 
The uronic acid component of each disaccharide unit may be unambiguously ascertained by 
completing compositional analysis after exhaustive nitrous acid treatment. By this method, 

20 compositional analysis of given oligosaccharides may be accomplished and the presence of 
G2S? hs, I and G containing building blocks assessed. With this information, rapid 
convergence to a single sequence could be completed by judicious application of the 
heparinases (regardless of their exact substrate specificity), since cleavage would give mass 
information on either side of the cleavage site. Thus, in the octasaccharide (example 1) case, 

25 application of exhaustive nitrous acid would yield lxAUManes, 2x I 2 sMan6s and lx GMan 6S . 
Then, digestion of this octasaccharide, after tagging, with heparinase III under any conditions 
(forcing or non-forcing) would result in the formation of a hexasaccharide m/z 5958.7 and a 
disaccharide, immediately fixing the sequence. A similar sequence of events may be used 
with heparinase I to converge to a single sequence for the octasaccharide. 

30 While there are caveats to the use of any one particular system for sequence analysis, 

whether the system is chemical degradation or enzymatic analysis, the sequencing strategy 
presented here is not critically dependent on any, single technique. One of the major strengths 
of the sequencing strategy of the invention is the flexibility of our approach and the 
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integration of MALDI and the coding scheme which enable the ability to adapt to different 
experimental constraints [For example, the recently cloned mammalian heparanase is another 
possible experimental constraint. M.D. Hulett et al., Nat. Med. 5,793 (1999); I. Vlodavsky et. 
ah, Nat. Med. 5, 803 (1999).]. As stated additional or different sets of experimental 
5 constraints may be used to not only arrive at a unique solution but also may be used to 
validate or confirm the solution from a given set of experimental constraints. 

Example 8: Methods for identifying protein-polysaccharide interactions and improved 
methods for sequencing. 

10 To identify HLGAG sequences that bind to a particular protein, the most common 

methodology involves affinity fractionation of oligosaccharides using a particular HLGAG 
subset, namely porcine intestinal mucosa heparin. Enzymatically or chemically derived 
heparin oligosaccharides of a particular length are passed over a column of immobilized 
protein. After washing, the bound fraction is eluted using high salt to disrupt interactions 

15 between the sulfates on the polysaccharide and basic residues on the protein; interactions 
which are crucial for binding. Eluted oligosaccharides are then characterized, typically by 
NMR. In this manner, sequences that bind to a number of proteins, including antithrombin III 
(AT-III), basic fibroblast growth factor (FGF-2), and endostatin have been identified. 

While rigorous and well tested, this approach suffers from a number of limitations. 

20 First, column chromatography requires large (milligram) amounts of material for successful 
analysis. Of the entire family of HLGAGs, only heparin is available in these quantities. 
However, heparin, due to its high sulfate content, contains a limited number of sequences, 
biasing the selection procedure. Thus, there is no opportunity to sample or select for unusual 
sequences that might in fact bind with high affinity. In vivo HLGAG-binding proteins sample 

25 and bind to the more structurally diverse heparan sulfate (HS) chains of proteoglycans at the 
cell surface where heparin-like sequences (i.e., sequences with a high degree of sulfation) do 
not always predominate. Heparin, while structurally related to HS, is present in vivo only in 
mast cells. For these reasons, heparin is not always an appropriate analog of cell surface HS, 
and in fact, the exclusive use of heparin in affinity fractionation experiments has created 

30 confusion in the field. One example illustrates this point. FGF-2 binds to a specific subset of 
heparan sulfate sequences that contain a critical 2-0 sulfated iduronate residue. Column 
chromatography has separated a high affinity binder of FGF-2, the sequence(s) of which have 
been identified as oligosaccharides containing the predominant trisulfated disaccharide 
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[I 2 sHNs,6s]n (n=3-6). However, rigorous examination of the crystal structures of FGF-2, 
including co-crystals of FGF with HLGAG oligosaccharides, indicates that only three contacts 
between sulfates and basic residues on FGF-2 are important for high affinity binding. 

Using the mass spectrometric approach of the invention we have developed an 
5 improved way to identify polysaccharide-protein interactions. The advantage of this approach 
is that it is highly sensitive, requiring only picomoles of material, which may be isolated from 
in vivo sources. As described below the approach may be used for the identification and 
sequencing of oligosaccharides that bind to proteins using picomoles of material. As a proof 
of concept, we show herein that this novel methodology is functionally equivalent to the 

10 established column affinity fractionation method for three proteins: FGF-1, FGF-2 and ATI II, 
using heparin oligosaccharides as a model system. Furthermore, we show herein that this 
system can be extended such that heparan sulfate isolated from the cell surface can be used to 
isolate binding proteins, demonstrating that, for the first time, unbiased, biologically relevant 
HLGAGs can be used to identify binding sequences. 

1 5 Methods: 

Protein preparation and immobilization. ATIII was incubated overnight with 
excess porcine mucosal heparin, then biotinylated with EZ-link sulfo-NHS biotin (Pierce). 
Canon NP Type E transparency film was taped to the MALDI sample plate and used as a 
protein immobilization surface. FGF-1 and FGF-2 were immobilized by spotting 1 ^il of 

20 aqueous solution on the film and air-drying. ATIII was immobilized by first drying 4jag 
neutravidin on the film surface, then adding biotinylated ATIII to the neutravidin spot. 
Heparin was removed by washing ten times with 1M NaCl and ten times with water. 

Saccharide binding, selection and analysis. Saccharides were derived from a partial 
digest of porcine mucosal heparin by heparinase I. The hexasaccharide fraction was obtained 

25 by size exclusion chromatography on Biogel P-6 and lyophilized to dryness. Saccharides were 
bound to immobilized proteins by spotting \\\X of aqueous solution on the protein spot for at 
least five minutes. Unbound saccharides were removed by washing with water fifteen times. 
For selection experiments, the spot was washed ten times with various NaCl concentrations, 
followed by ten water washes. Caffeic acid matrix in 50% acetonitrile with 2pmol/|al (RG)i 9 R 

30 was added to the spot prior to MALDI analysis. All saccharides were detected as noncovalent 
complexes with (RG)i 9 R using MALDI parameters described herein. 

Saccharide digestion by heparinase I or III. Saccharides selected for FGF-2 
binding were digested with heparinases I or III by spotting 8|ug of enzyme in water after 
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selection was completed. The spot was kept wet for the desired digestion time by adding 
water as necessary. Caffeic acid matrix with 2pmol/^iI (RG)i 9 R was added to the spot for 
MALDI analysis. 

Isolation, Purification, and Selection of FGF binders from SMC heparan sulfate. 

5 Bovine aortic smooth muscle cells (SMCS) were grown to confluency. Cells were washed 
twice with PBS and then 200 nM heparinase III was added for 1 hr. The supernatant was 
heated to 50°C for 10 minutes to inactivate heparinase III and filtered. To remove 
polynucleotide contamination, the samples were treated with DNAse and RNAse at room 
temperature overnight. Heparan sulfate was isolated by binding to a DEAE filter, washing 

10 away unbound material, and elution using 10 mM sodium phosphate 1M NaCl pH 6.0. The 
material was then concentrated and buffer exchanged into water using a 3,000 MWCO 
membrane. The retentate was lyophilized and reconstituted in water. 100 nM heparinase II 
was added and aliquots were taken at 5, 10, 20, and 30 minutes post-addition. 1 \xL was 
spotted on FGF. After drying, the sample was washed, 2pmol/|j,l (RG)| 9 R in matrix was 

1 5 added, and the sample was analyzed as outlined above. 
Results: 

Saccharide binding to FGF-2 and FGF-L As a first step towards the development of 
a viable MALDI selection procedure, the FGF system using its prototypic members, viz. FGF- 
1 and FGF-2 was selected. Initial experiments involved the use of a purified polysaccharide 

20 (Hexa 1 of Table 12) that is known to bind with high affinity to FGF. With FGF-2, we found 
that Hexa 1 binds to FGF-2 and were detected, even with a salt wash of 0.5M NaCl, consistent 
with the known affinity of Hexa 1 for FGF-2. In addition, when an equimolar mixture of Hexa 
1 and Hexa 2 (a low affinity binder) were applied to FGF-2 and washed with 0.2M NaCl to 
eliminate nonspecific binding, only Hexa 1 was observed. Together, these results point to the 

25 fact that, under of the conditions of the experiment, immobilized FGF-2 retained the same 
binding specificity as FGF in solution. Further demonstrating that binding specificity was 
intact, heat denaturation of FGF resulted in the detection of no saccharide binders. 



Saccharide 


Sequence 


Hexa 1 


(a) ±DDD or (b) DDMan 6S 


Hexa 2 


±D4-7 


Penta 1 





30 



Table 12 
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FGF affinity fractionation of a hexasaccharide mixture derived from the enzymatic 
depolymerization of heparin was used to enrich for FGF binders. To determine whether 
specific binders could be selected from a more complex mixture using our methodology, a 
hexasaccharide fraction derived from incomplete heparinase I digestion of porcine intestinal 
5 mucosa heparin was spotted on immobilized FGF. At least five unique structures were 
detected in the unfractionated hexasaccharide mixture. Upon a salt wash, only two structures, 
8- and 9-sulfated hexasaccharides, remained. Importantly, the same results could alternately 
be achieved by enriching the spot for specific binders and competing off low affinity binders. 
FGF-1, which has been shown to have similar binding properties as FGF-2, could also select 

1 0 for the octa- and nonasulfated hexasaccharides from a mixture. 

Sequencing saccharides on the MALDI surface. The highly sensitive sequencing 
methodology of the invention was used to test whether we could derive structural information 
of FGF high affinity binders on target. The octa- and nonasulfated saccharides were subjected 
to enzymatic and chemical depolymerization. After saccharide selection, the saccharide 

15 sample was depolymerized by heparinase 1 to obtain sequence information. The nonasulfated 
hexasaccharide was reduced to a single trisulfated disaccharide indicating that this saccharide 
is a repeat of [I2sH.Ns.6s]- Digestion of the octasulfated hexasaccharide yielded the trisulfated 
disaccharide and a pentasulfated tetrasaccharide. That this tetrasaccharide contains an 
unsulfated uronic acid was confirmed by heparinase III cleavage, which resulted in the 

20 disappearance of the tetrasaccharide. Confirmation of our sequencing assignments were made 
by isolating the octa- and nonasulfated hexasaccharides and sequenced using the methods 
described herein. Thus, the sequence of the nonasulfated hexasaccharide is ±DDD 
(AU 2 sHns,6s1 2sHns,6sI2sHns,6s) and the sequence of the octasulfated hexasaccharide is ±DD-5. 
Saccharide Binding to Ant ithromb in-Ill. ATIII is heavily glycosylated, therefore we 

25 anticipated that it would not bind well to the MALDI plate. As an alternative strategy, avidin 
was immobilized on the plate and biotinylated AT-III was bound to the avidin. The ATIII 
biotinylation reaction was carried out in the presence of heparin to protect the protein's 
binding site for HLGAG oligosaccharides. After washing off the complexed heparin, penta 1, 
that contains an intact AT-III pentasaccharide binding sequence was used to verify that the 

30 protein was immobilized on the surface and was able to bind saccharides. Penta 1 binding to 
ATIII was observed up to washes of 0.5M NaCl, consistent with it being a strong binder to 
ATIII. 
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Furthermore, this binding is also specific. Introduction of a solution of hexal, hexa 2, 
and penta 1 to immobilized ATIII followed by a 0.2 M salt wash to remove non-specific 
binders resulted in signal only for penta 1. Interestingly, there was no signal from hexa 2 that 
contains a partially intact ATIII binding site, suggesting that, under our selection conditions, 
5 only sequences with a full binding site will be selected for. 

Selection of FGF-2 Binders in SMC HS. Heparan sulfate at the cell surface of SMCs is 
known to contain high affinity sites for FGF binding. In an effort to extend our initial studies 
with highly sulfated heparin, we sought to identify high affinity FGF binders in heparan 
sulfate proteoglycans at the cell surface of SMCs. To this end, SMCs were treated with either 

10 heparinase I or heparinase III and the HLGAGs isolated and purified. Consistent with the 
known substrate specificity of the enzymes, the composition of released fragments is 
different. Fragments were then treated with heparinase II to reduce them in size. At certain 
time points, the digest was spotted on FGF-2 and selection process was accomplished as 
outlined above. Consistent with our findings with heparin, a single hexasaccharide was 

15 identified to be a high affinity binder for FGF-2, namely the nonasulfated hexasaccharide with 
a sequence +DDD. 

The above-methodology describes an alternative protocol for the selection of 
saccharide binders to proteins. This methodology has been applied towards the identification 
of oligosaccharides derived from heparin that bind to two well-established systems, FGF and 

20 ATIII. As shown, this procedure produces identical results to the more established 
methodology of affinity fractionation. For FGF-1 and FGF-2, high affinity binders can be 
selected out of a pool of similar saccharides. In addition, ATIII, can be selected for high 
affinity binders over binders that contain only a partial binding site. 

This methodology has a number of critical advantages over prior art strategies. First, 

25 it is possible to derive sequence information from the bound saccharides directly on a target. 
Second, and more substantially, the analysis with both FGF and ATIII required only 
picomoles of material for both the protein and saccharide. Such an advance makes it feasible 
to use the more biologically relevant HS isolated from the cell surface as substrates, rather 
than highly sulfated heparin from mast cells. Finally, while the Example demonstrated this 

30 technique for the chemically complex and information dense HLGAGs, it is widely applicable 
towards identifying other polysaccharide-protein interactions. 



-54- 

Example 9: Methods for identifying branching and methods for sequencing branched 
polysaccharides. 

Increasing evidence exists that glycosylation patterns are highly influenced by the 
phenotype of the cell. With the onset of disease, it has been noted that there are changes in 
5 glycan structure, especially in the degree of branching. For instance, in pathogenic versus 
normal prion proteins, there is a decrease in levels of glycans with bisecting GlcNAc residues 
and increased levels of tri- and tetrantennary structures. By judicious application of 
enzymatic and chemical degradation the identity of branched chains may also be identified. 

MS Analysis of Complex Glycan Structures : As shown in Figure 8, the extended core 

1 0 structures generated from complex N-glycan structures were enzymatically generated and 

identified. MALDI-MS analysis was performed on the extended core structures derived from 
enzymatic treatment of a mixture of bi- and triantennary structures. 1 pmol of each saccharide 
was subjected to digest with an enzyme cocktails that included sialidase from A. urefaciens 
and (3-galactosidase from S. pneumoniae. The mass signature of 1462.4 indicates that one of 

1 5 the structures is biantennary with a core fucose moiety, while the mass signature of 1 665.8 is 
indicative of a triantennary structure, also with a core fucose. [O] = mannose; [0]= fucose; 
[□]= N-acetylglucosamine; [□]= galactose; and [A]=N-acetylneuraminic acid. 

MALDI-MS sequencing of the N-linked polysaccharide of PSA: Next, rapid 
sequencing of the glycan structure of PSA from normal prostate tissue was performed (Figure 

20 9). Figure 9 is data arising from MALDI-MS microsequencing of the PSA polysaccharide 

structure. MALDI-MS was completed using 500 fmol of saccharide. Analysis was completed 
with a saturated aqueous solution of 2,5-dihydroxybenzoic with 300 mM spermine as an 
additive. Analytes were detected in the negative mode at an accelerating voltage of 22 kV. 1 
|iL of matrix was added to 0.5 jiL of aqueous sample and allowed to dry on the target. (A) 

25 MS of the intact polysaccharide structure. Peaks marked with an asterisk are impurities, and 
the analyte peak is detected both as M-H (m/z 2369.5) and as a monosodiated adduct (M+Na- 
2H, m/z 2392.6). (B) Treatment of [A] with sialidase from A. urefaciens. 1 0 pmol of 
saccharide was incubated with enzyme overnight at 37°C in 10 mM sodium acetate pH 5.5 
according to the manufacturer's instructions. Two new saccharides were seen, the first, at m/z 

30 2078 corresponding to the loss of one sialic acid moiety and the second at m/z 1 786.9 

corresponding to the loss of two sialic acids from the non-reducing end. (C) Digest of [B] with 
galactosidase from S. pneumoniae. Digest procedures were completed essentially as 
described above. A signal product at m/z 1462.8 indicated that two galactose residues were 
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removed upon treatment of [B] with the enzyme. (D) Digest of [C] with N- 
acetylhexosaminidase from S. pneumoniae. One product was observed as both M-H (m/z 
1056.3) and M +Na-2H (m/z 1078.1) corresponding to the loss of two N-acetylhexosamine 
units from [C]. A Table of the analysis scheme with schematic structure and theoretical 
5 molecular masses is presented in the center of Figure 9. Shown are the parent polysaccharide 
and enzymatically derived products seen in this analysis. [O] = mannose; [0]= fucose; [□]= 
N-acetylglucosamine; [□]= galactose; and [A]=N-acetylneuraminic acid. 

Studies of the intact polysaccharide via NMR (large quantities of PSA were required 
for this study) yielded sequence information of the glycan [Belanger, A., van Halbeek, H., 

10 Graves, H.C.B., Grandbois, K., Stamey, T.A., Huang, L., Poppe, I., and Labrie, F., Prostate, 
1995. 27: p. 187-197]. Similar to other N-linked glycoproteins, as stated above, PSA contains 
a core biantennary branched motif. Extending from each mannose arm of PSA is a 
trisaccharide unit. Together these modifications indicated an expected molecular mass of 2370 
Da for the intact polysaccharide. Using MALD1-MS and an exoglycosidase array we have 

1 5 sequenced the putative structure for the N-linked polysaccharide on PSA (Figure 9). 
Analysis of the intact polysaccharide yields a molecular mass of 2370 Da (Figure 9A), 
identical to the predicted molecular mass based on its structure. In fact for all structures and 
enzymatic products derived from them, a mass accuracy of less than one Dalton is realized. 
In initial studies, we had found that maximum sensitivity was obtained with 2,5- 

20 dihydroxybenzoic acid as the matrix with spermine as an additive [Mechref, Y. and M. V. 
Novotny, Matrix-assisted laser desorption/ionization mass spectrometry of acidic 
glycoconjugates facilitated by the use of spermine as a co-matrix. J Am Soc Mass Spectrom, 
1998. 9(12): p. 1293-302.]. In this case, oligosaccharides were detected as negative ions. As 
outlined above, these conditions yielded maximal sensitivity (a limit of detection of around 

25 500 fmol or about 1.5 ng) and also a homogenous signal, which is free of detectable adducts. 
Of note is the fact that negative mode detection makes amenable the analysis of sialic- 
containing pendant arms, but detection can also be done in the positive mode with different 
matrix conditions. Treatment of the polysaccharide with sialidase (specific cleavage of 
2Neua— »6,8 linkages) resulted in a mass decrease of 618 Da consistent with the cleavage of 

30 two sialic acid residues (Figure 9B). Treatment of this saccharide with p-galactosidase 
resulted in a further 360 Da decrease in mass, confirming the presence of two galactose 
residues located proximate to the sialic acids (Figure 9C). Importantly when the asilao 
structure of Figure 9B was treated with another enzyme besides p-galactosidase, no reduction 



-56- 

in mass was observed, confirming the identity of these units as (3-linked galactose residues. 
Via systematic application of the exoglycosidases, we can "read through" the entire sequence 
of the putative glycan structure of PSA. In addition, not only can we "read through" the 
structure, but our methodology was able to complete the analysis using submicrogram 
5 amounts of material. Also, since at every step of "reading" the sequence we determined the 
mass, we had an internal control to ensure that our assumptions of enzyme specificity and N- 
glycan structure were correct. 

Direct Sequencing of the PSA Polysaccharide Information about the structure of the 
sugar moiety of PSA can not only be derived by isolating the sugar and sequencing it (such as 

10 by using the above methodology), but we can also derive information about the sugar 

structure without removal from the protein. Figure 10 shows the results of sequencing the 
sugar of PSA (Sigma Chemical). Figure 10 shows the results of enzymatic degradation of 
the saccharide chain directly off of PSA. 50 pmol of PSA (-1 .4 ^ig) of PSA was denatured by 
heat treatment at 80°C for 20 minutes. Then the sample was sequentially treated with the 

1 5 exoenzymes (B-D). After overnight incubation at 37°C, 1 pmol of the digested PSA was 

examined by mass spectrometry. Briefly, the aqueous sample was mixed with sinapinic acid 
in 30% acetonitrile, allowed to dry, and then examined by MALD1 TOF. All spectra were 
calibrated externally with a mixture of myoglobin, ovalbumin, and BSA to ensure accurate 
molecular mass determination. (A) PSA before the addition of exoenzymes. The measured 

20 mass of 28,478 agreed well with the reported value of 28,470. (B) Treatment of (A) with 
sialidase resulted in a mass decrease of 287 Da, consistent with the loss of one sialic acid 
residue. (C) Treatment of (B) with galactosidase. A further decrease of 321 Da indicated the 
loss of two galactose moieties. (D) Upon digestion of (C) with hexosaminidase, a decrease of 
393 Da indicated the loss of two N-acetylglucosamine residues. 

25 The protein had a measured mass of 28,478.3 (Figure 10A). Treatment of the intact 

protein with sialidase resulted in a decrease of 287 Da, consistent with the loss of one sialic 
acid residue (Figure 10B). Additional treatment with galactosidase resulted in a decrease in 
mass of 321, consistent with the loss of two galactose residues (Figure 10C). Finally, 
treatment with N acetyl hexosaminidase resulted in cleavage of two GlcNAc moieties (Figure 

30 10D). 

Glvcotvpins: of PSA by EndoF2 Treatment EndoF2 is an endoglycanase that clips 
only biantennary structures. Tri- and tetrantennary structures do not serve as substrates for 
this enzyme (Figure 11) . In this way, EndoF2 treatment of a glycan structure, either attached 
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to the protein or after isolation, was used to identify branching identity. This becomes 
especially important in light of the fact that aberrant changes in glycosylation patterns usually 
result in increased branching. In addition, EndoF2 was used to cleave glycan structures that 
were still attached to the protein of interest. Indeed, treatment of PSA with EndoF2 resulted 
5 in mass shift, consistent with the loss of a biantennary, complex type glycan structure. Figure 
1 1 showed the results of treatment of biantennary and triantennary saccharides with 
endoglycanse F2. (A) Treatment of the biantennary saccharide resulted in a mass decrease of 
348.6, indicating cleavage between the GlcNAc residues. (B) Treatment of the triantennary 
saccharide with the same substituents resulted in no cleavage showing that EndoF2 primarily 

10 cleaves biantennary structures. (C) EndoF2 treatment of heat denatured PSA. There was a 
mass reduction of 1 709.7 Da in the molecular mass of PSA (compare 1 1C and 1 1 A) 
indicating that the normal glycan structure of PSA was biantennary. 

A computer system for implementing the system 100 of FIG. 1 as a computer program 
typically includes a main unit connected to both an output device which displays information 

15 to a user and an input device which receives input from a user. The main unit generally 

includes a processor connected to a memory system via an interconnection mechanism. The 
input device and output device also are connected to the processor and memory system via the 
interconnection mechanism. 

It should be understood that one or more output devices may be connected to the 

20 computer system. Example output devices include a cathode ray tube (CRT) display, liquid 
crystal displays (LCD), printers, communication devices such as a modem, and audio output. 
It should also be understood that one or more input devices may be connected to the computer 
system. Example input devices include a keyboard, keypad, track ball, mouse, pen and tablet, 
communication device, and data input devices such as sensors. It should be understood the 

25 invention is not limited to the particular input or output devices used in combination with the 
computer system or to those described herein. 

The computer system may be a general purpose computer system which is 
programmable using a computer programming language, such as C++, Java, or other 
language, such as a scripting language or assembly language. The computer system may also 

30 include specially programmed, special purpose hardware. In a general purpose computer 

system, the processor is typically a commercially available processor, of which the series x86, 
Celeron, and Pentium processors, available from Intel, and similar devices from AMD and 
Cyrix, the 680X0 series microprocessors available from Motorola, the PowerPC 
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microprocessor from IBM and the Alpha-series processors from Digital Equipment 
Corporation, are examples. Many other processors are available. Such a microprocessor 
executes a program called an operating system, of which Windows NT, Linux, UNIX, DOS, 
VMS and OS8 are examples, which controls the execution of other computer programs and 
5 provides scheduling, debugging, input/output control, accounting, compilation, storage 
assignment, data management and memory management, and communication control and 
related services. The processor and operating system define a computer platform for which 
application programs in high-level programming languages are written. 

A memory system typically includes a computer readable and writeable nonvolatile 

10 recording medium, of which a magnetic disk, a flash memory and tape are examples. The 

disk may be removable, known as a floppy disk, or permanent, known as a hard drive. A disk 
has a number of tracks in which signals are stored, typically in binary form, i.e., a form 
interpreted as a sequence of one and zeros. Such signals may define an application program 
to be executed by the microprocessor, or information stored on the disk to be processed by the 

1 5 application program. Typically, in operation, the processor causes data to be read from the 

nonvolatile recording medium into an integrated circuit memory element, which is typically a 
volatile, random access memory such as a dynamic random access memory (DRAM) or static 
memory (SRAM). The integrated circuit memory element allows for faster access to the 
information by the processor than does the disk. The processor generally manipulates the data 

20 within the integrated circuit memory and then copies the data to the disk after processing is 
completed. A variety of mechanisms are known for managing data movement between the 
disk and the integrated circuit memory element, and the invention is not limited thereto. It 
should also be understood that the invention is not limited to a particular memory system. 

The invention is not limited to a particular computer platform, particular processor, or 

25 particular high-level programming language. Additionally, the computer system may be a 
multiprocessor computer system or may include multiple computers connected over a 
computer network. That each module (e.g. 108, 1 12) in FIG. 1 may be separate modules of a 
computer program, or may be separate computer programs. Such modules may be operable 
on separate computers. Data (e.g. 102, 110, 114, 116, and 1 18) may be stored in a memory 

30 system or transmitted between computer systems. The invention is not limited to any 
particular implementation using software or hardware or firmware, or any combination 
thereof. The various elements of the system, either individually or in combination, may be 
implemented as a computer program product tangibly embodied in a machine-readable 
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storage device for execution by a computer processor. Various steps of the process may be 
performed by a computer processor executing a program tangibly embodied on a 
computer-readable medium to perform functions by operating on input and generating output. 
Computer programming languages suitable for implementing such a system include 
procedural programming languages, object-oriented programming languages, and 
combinations of the two. 

The present invention is not to be limited in scope by examples provided, since the 
examples are intended as a single illustration of one aspect of the invention. Various 
modifications of the invention in addition to those shown and described herein will become 
apparent to those skilled in the art from the foregoing description and fall within the scope of 
the appended claims. The advantages and objects of the invention are not necessarily 
encompassed by each embodiment of the invention. All references, patents and patent 
publications that are recited in this application are incorporated in their entirety herein by 
reference. 

We claim: 



